NUMBER THEORY REVEALED: 


A MASTERCLASS 


ANDREW GRANVILLE 


e e@ 
oe" AMERICAN 


eos” 
33: A MS MATHEMATICAL 
he SOCIETY 


NUMBER THEORY REVEALED: 
A MASTERCLASS 


NUMBER THEORY REVEALED: 
A MASTERCLASS 


ANDREW GRANVILLE 


¥ AMERICAN 


+ A MS MATHEMATICAL 


SOCIETY 


Providence, Rhode Island 


Cover design by Marci Babineau. 


Front cover image of Srinivasa Ramanujan in the playing card: Oberwolfach Photo 
Collection, https: //opc.mfo.de/; licensed under Creative Commons Attribution Share 
Alike 2.0 Germany, https://creativecommons. org/licenses/by-sa/2.0/de/deed.en. 


Front cover image of Andrew Wiles in playing card, credit: Alain Goriely. 


2010 Mathematics Subject Classification. Primary 11-01, 11A55, 11B30, 11B39, 11D09, 
11D25, 11N05, 11N25. 


For additional information and updates on this book, visit 
www.ams.org/bookpages/mbk-127 


Library of Congress Cataloging-in-Publication Data 


Cataloging-in-Publication Data has been applied for by the AMS. 
See http:www.loc.gov/publish/cip/. 


Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting 
for them, are permitted to make fair use of the material, such as to copy select pages for use 
in teaching or research. Permission is granted to quote brief passages from this publication in 
reviews, provided the customary acknowledgment of the source is given. 

Republication, systematic copying, or multiple reproduction of any material in this publication 
is permitted only under license from the American Mathematical Society. Requests for permission 
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For 
more information, please visit www.ams.org/publications/pubpermissions. 

Send requests for translation rights and licensed reprints to reprint-permission@ams.org. 


© 2019 by the American Mathematical Society. All rights reserved. 
The American Mathematical Society retains all rights 


except those granted to the United States Government. 
Printed in the United States of America. 


The paper used in this book is acid-free and falls within the guidelines 
established to ensure permanence and durability. 
Visit the AMS home page at https://www.ams.org/ 


10987654321 24 23 22 21 20 19 


Dedicated to my beloved wife, Marci. 
Writing this book has had its challenges. 
Being the spouse of the author, while writing 
this book, has also had its challenges. 


The enchanting charms of this sublime science 
reveal themselves only to those who have the 
courage to go deeply into it. 
CARL FRIEDRICH GAUSS, 1807 


Contents 


Preface 
Gauss’s Disquisitiones Arithmeticae 


Notation 


% i 
ig % 
Ele El Bl 


The language of mathematics 


me 
M4 
iS 


Prerequisites 


Preliminary Chapter on Induction 
0.1. Fibonacci numbers and other recurrence sequences 
0.2. Formulas for sums of powers of integers 


0.3. The binomial theorem, Pascal’s triangle, and the binomial coefficients 


BI bol EI EI) 


Appendices for Preliminary Chapter on Induction 
OA. A closed formula for sums of powers 
OB. Generating functions 
OC. Finding roots of polynomials 
OD. What is a group? 
OE. Rings and fields 
OF. Symmetric polynomials 
0G. Constructibility 


Chapter 1. The Euclidean algorithm 
1.1. Finding the gcd 
1.2. Linear combinations 
1.3. The set of linear combinations of two integers 


1.4. The least common multiple 


;| EEE Seba 


1.5. 
1.6. 


Continued fractions 


Tiling a rectangle with squares 


Appendices for Chapter 1: 


1A. 
1B. 
1C. 


1D. 


1E. 


Reformulating the Euclidean algorithm 
Computational aspects of the Euclidean algorithm 
Magic squares 

The Frobenius postage stamp problem 

Egyptian fractions 


Chapter 2. Congruences 


Dek 
2.2. 
2.3. 
2.4. 


Basic congruences 

The trouble with division 
Congruences for polynomials 
Tests for divisibility 


Appendices for Chapter 2: 


2A. 
2B. 


Congruences in the language of groups 


The Euclidean algorithm for polynomials 


Chapter 3. The basic algebra of number theory 


3.1. 
3.2. 
3.3. 
3.4. 
3.5. 
3.6. 
3.7. 
3.8. 


The Fundamental Theorem of Arithmetic 
Abstractions 

Divisors using factorizations 

Irrationality 

Dividing in congruences 

Linear equations in two unknowns 
Congruences to several moduli 


Square roots of 1 (mod n) 


Appendices for Chapter 3: 


3A. 
3B. 
3C. 
3D. 
3E. 
3F. 
3G. 


Factoring binomial coefficients and Pascal’s triangle modulo p 
Solving linear congruences 

Groups and rings 

Unique factorization revisited 

Gauss’s approach 

Fundamental theorems and factoring polynomials 


Open problems 


Chapter 4. Multiplicative functions 


4.1. 
4.2. 


Euler’s ¢-function 
Perfect numbers. “The whole is equal to the sum of its parts.” 


Contents 


EIB) 


EIBIEIEIE] BEIBIEIE) 


Ee HEEB BS 


_ ele] 
I) bo] JR | fe 
(@.¢) fe) 


_ 
I) 
is) 


Contents 


Appendices for Chapter 4: 


4A. More multiplicative functions 

4B. Dirichlet series and multiplicative functions 
4C. Irreducible polynomials modulo p 

4D. The harmonic sum and the divisor function 
4E. Cyclotomic polynomials 


Chapter 5. The distribution of prime numbers 


5.1. Proofs that there are infinitely many primes 
5.2. Distinguishing primes 

5.3. Primes in certain arithmetic progressions 
5.4. How many primes are there up to x? 

5.5. Bounds on the number of primes 

5.6. Gaps between primes 

5.7. Formulas for primes 


Appendices for Chapter 5: 


5A. 


Bertrand’s postulate and beyond 


Bonus read: A review of prime problems 


Prime values of polynomials in one variable 


Prime values of polynomials in several variables 


Goldbach’s conjecture and variants 


5B. An important proof of infinitely many primes 


5C. 
5D. 


5E. Prime patterns: Consequences of the Green-Tao Theorem 


What should be true about primes? 


Working with Riemann’s zeta-function 


5F. <A panoply of prime proofs 


5G. 


5H. Dynamical systems and infinitely many primes 


Searching for primes and prime formulas 


Chapter 6. Diophantine problems 


6.1. The Pythagorean equation 

6.2. No solutions to a Diophantine equation through descent 
6.3. Fermat’s “infinite descent” 

6.4. Fermat’s Last Theorem 


Appendices for Chapter 6: 


Es 


SS = =] f=] fe = 
OU [OU] Jor oy pS | iw 
aS ic) oe 


= 
oxi 
fo) 


— Tlf 
I> | [> | | 
i> 


HT Rl 
Ma] INT] IN] [NI 
— 


_ 
i~J 
‘-) 


is) H=lJ—1r 
iS > | Joo | joo 
No | loo} In N 


ho 
= 
XS 


wD WD] 
is) pang at 
— [ eae } [e2) QO fe! 


6A. Polynomial solutions of Diophantine equations 


6B. No Pythagorean triangle of square area via Euclidean 
geometry 


6C. Can a binomial coefficient be a square? 


bo | Iho we) 
BIE] El 
fo) 


xii 


Chapter 7. Power residues 


7.1. 
7.2. 
7.3. 
7.4. 
7.0. 
7.6. 
7.7. 
7.8. 
7.9. 


Generating the multiplicative group of residues 
Fermat’s Little Theorem 
Special primes and orders 


Further observations 


The number of elements of a given order, and primitive roots 


Testing for composites, pseudoprimes, and Carmichael numbers 


Divisibility tests, again 
The decimal expansion of fractions 


Primes in arithmetic progressions, revisited 


Appendices for Chapter 7: 


7A. 
7B. 
7C. 
7D. 
TE. 
7F. 
7G. 
7H. 
71. 


Card shuffling and Fermat’s Little Theorem 
Orders and primitive roots 

Finding nth roots modulo prime powers 
Orders for finite groups 

Constructing finite fields 

Sophie Germain and Fermat’s Last Theorem 
Primes of the form 2" + k 


Further congruences 


Primitive prime factors of recurrence sequences 


Chapter 8. Quadratic residues 


8.1. 
8.2. 
8.3. 
8.4. 
8.5. 
8.6. 
8.7. 
8.8. 


Squares modulo prime p 

The quadratic character of a residue 
The residue —1 

The residue 2 

The law of quadratic reciprocity 

Proof of the law of quadratic reciprocity 
The Jacobi symbol 


The squares modulo m 


Appendices for Chapter 8: 


8A. 
8B. 
8C. 
8D. 
8E. 


Eisenstein’s proof of quadratic reciprocity 
Small quadratic non-residues 


The first proof of quadratic reciprocity 


Dirichlet characters and primes in arithmetic progressions 


Quadratic reciprocity and recurrence sequences 


Contents 


bo] [bo 
Wo} jw 
iS 


LO] PO] RO] feo I) 
EIEJEIEJEIEIEIE 
(@2) a] iS _— ee} ic 


ho iS) 
jo) Or 
ee) N 


bo 
D> 
fo) 


Wo} Joo} Joo} lw] jw NO} WO] ky bo ho 
EIEIEIEIEIEIEIEIE] EIEIEIEIEI 
ls) IK | Io | bX! lc] leo 


Ww WU} Jeo} joo 
ew) bo] fe] Je 
iS fo) 


Contents 


Chapter 9. Quadratic equations 


9.1. 
9.2. 
9.3. 
9.4. 


9.5. 


9.6. 


Sums of two squares 


The values of x? + dy? 


Is there a solution to a given quadratic equation? 


Representation of integers by ax? + by? with x, y rational, 


and beyond 


The failure of the local-global principle for quadratic equations 


in integers 


Primes represented by x? + 5y? 


Appendices for Chapter 9: 


9A. Proof of the local-global principle for quadratic equations 


9B. Reformulation of the local-global principle 


9C. 
9D. 


The number of representations 


Descent and the quadratics 


Chapter 10. Square roots and factoring 


10.1. 
10.2. 
10.3. 
10.4. 
10.5. 
10.6. 


Square roots modulo n 
Cryptosystems 
RSA 


Certificates and the complexity classes P and NP 


Polynomial time primality testing 


Factoring methods 


Appendices for Chapter 10: 


10A. 
10B. 
10C. 
10D. 
10E. 
10F. 
10G. 
10H. 


Pseudoprime tests using square roots of 1 
Factoring with squares 

Identifying primes of a given size 
Carmichael numbers 

Cryptosystems based on discrete logarithms 
Running times of algorithms 

The AKS test 


Factoring algorithms for polynomials 


Chapter 11. Rational approximations to real numbers 


11.1. 
11.2. 
11.3. 
11.4. 
11.5. 


The pigeonhole principle 

Pell’s equation 

Descent on solutions of x? — dy? =n, d>0 
Transcendental numbers 


The abc-conjecture 


xiii 


KO | Joo ew) iw WO | Jeo 
HS | pS ws dy (SUE (Se) 
— Ee | ic 


Wo | Jeo Wo | Jeo Wo} jew 
mI | JN IS | I> Ot} JOU 
No] loo} leet Ie otis Qo 


SS SIS WO} JOO] joo] [Co] Jew] joo 
IE SITIO] jo KS | ILO | [SO | [Oo] [Oo] joo 
i a} S (e) [od =) op) 


— 
— 
os 


Xiv 


Contents 


Appendices for Chapter 11: 


11A. 
11B. 
11C. 
11D. 


Uniform distribution 
Continued fractions 
Two-variable quadratic equations 


Transcendental numbers 


Chapter 12. Binary quadratic forms 


12.1. 
12.2. 
12.3. 
12.4. 
12.5. 


Representation of integers by binary quadratic forms 
Equivalence classes of binary quadratic forms 

Congruence restrictions on the values of a binary quadratic form 
Class numbers 


Class number one 


Appendices for Chapter 12: 


12A. 
12B. 
12C. 
12D. 
12K. 
12F. 
12G. 


Composition rules: Gauss, Dirichlet, and Bhargava 
The class group 

Binary quadratic forms of positive discriminant 
Sums of three squares 

Sums of four squares 

Universality 


Integers represented in Apollonian circle packings 


Chapter 13. The anatomy of integers 


13.1. 


13.2. 
13.3. 
13.4. 


Rough estimates for the number of integers with a fixed number 
of prime factors 


The number of prime factors of a typical integer 
The multiplication table problem 


Hardy and Ramanujan’s inequality 


Appendices for Chapter 13: 


138A. 
13B. 


Other anatomies 


Dirichlet L-functions 


Chapter 14. Counting integral and rational points on curves, modulo p 


14.1. 
14.2. 


14.3. 
14.4. 
14.5. 
14.6. 


Diagonal quadratics 


Counting solutions to a quadratic equation and another proof of 
quadratic reciprocity 


Cubic equations modulo p 
The equation Ey: y? =2> +b 
The equation y? = 2° + ax 


A more general viewpoint on counting solutions modulo p 


> aes > eS 
oS He | KD ho 
oo) one BS ©] oo leo) 


oS 
pS 
lo) 


| eS eS SS SSS Ses 
KO | LO (oe) (ee) OOF INSTT IN | IN I> | JOU 
NX Fr | [20 N S IK | 100 Ss 


aq or] fou i Sy 
on) iO | |O KO | [LO 
| fe 


ey 
iO 
im 


Ot] JOU} jou 
S H 


Contents 


Appendices for Chapter 14: 


14A. 


Gauss sums 


Chapter 15. Combinatorial number theory 


15.1. 
15.2. 
15.3. 
15.4. 
15.5. 
15.6. 
15.7. 


Partitions 

Jacobi’s triple product identity 

The Freiman-Ruzsa Theorem 

Expansion and the Pliinnecke-Ruzsa inequality 
Schnirel’man’s Theorem 

Classical additive number theory 

Challenging problems 


Appendices for Chapter 15: 


15A. 
15B. 


Summing sets modulo p 


Summing sets of integers 


Chapter 16. The p-adic numbers 


16.1. 
16.2. 
16.3. 
16.4. 
16.5. 
16.6. 
16.7. 


The p-adic norm 

p-adic expansions 

p-adic roots of polynomials 

p-adic factors of a polynomial 

Possible norms on the rationals 

Power series convergence and the p-adic logarithm 


The p-adic dilogarithm 


Chapter 17. Rational points on elliptic curves 


17.1. 
17.2. 
17.3. 
17.4. 
17.5. 
17.6. 


The group of rational points on an elliptic curve 
Congruent number curves 

No non-trivial rational points by descent 

The group of rational points of y? = x? 
Mordell’s Theorem: F4(Q) is finitely generated 


Some nice examples 


—@ 


Appendices for Chapter 17: 


17A. 
17B. 
17C. 
17D. 


General Mordell’s Theorem 
Pythagorean triangles of area 6 
2-parts of abelian groups 
Waring’s problem 


Hints for exercises 


Recommended further reading 


Index 


x 


< 


OUP TOU OUP OUP [Ou] [Ou] for Or 
Dl} sie |e tie tle 
loa) N3 | leo i 


SU] JOU] JOU Or OU] OUT OUT [OUT [Ou OU] JOU OU 
EJEIEIEJE!] EIBIEIEIEIEIEIE] BIE) 
= | [oo Nod t=} ko iS No] lc 


Ot 
Ou 
is 


E] Hf 


tp fOup JOU, JOup Jou] Jou] fou 
e) El E] EIEIEIE! 
° iS i 


Preface 


This is a modern introduction to number theory, aimed at several different audi- 
ences: students who have little experience of university level mathematics, students 
who are completing an undergraduate degree in mathematics, as well as students 
who are completing a mathematics teaching qualification. Like most introductions 
to number theory, our contents are largely inspired by Gauss’s Disquisitiones Arith- 
meticae (1801), though we also include many modern developments. We have gone 
back to Gauss to borrow several excellent examples to highlight the theory. 


There are many different topics that might be included in an introductory 
course in number theory, and others, like the law of quadratic reciprocity, that surely 
must appear in any such course. The first dozen chapters of the book therefore 
present a “standard” course. In the masterclass version of this book we flesh out 
these topics, in copious appendices, as well as adding five additional chapters on 
more advanced themes. In the introductory version we select_an appendix for each 
chapter that might be most useful as supplementary material[] A “minimal” course 
might focus on the first eight chapters and at least one of chapters 9 and 102] 


Much of modern mathematics germinated from number-theoretic seed and one 
of our goals is to help the student appreciate the connection between the relatively 
simply defined concepts in number theory and their more abstract generalizations 
in other courses. For example, our appendices allow us to highlight how mod- 
ern algebra stems from investigations into number theory and therefore serve as 
an introduction to algebra (including rings, modules, ideals, Galois theory, p-adic 
numbers,...). These appendices can be given as additional reading, perhaps as 
student projects, and we point the reader to further references. 

Following Gauss, we often develop examples before giving a formal definition 
and a theorem, firstly to see how the concept arises naturally, secondly to conjecture 
a theorem that describes an evident pattern, and thirdly to see how a proof of the 
theorem emerges from understanding some non-trivial examples. 


1Tn the main text we occasionally refer to appendices that only appear in the masterclass version. 
? Several sections might be discarded; their headings are in bold italics. 


XVil 


XViii Preface 


Why study number theory? Questions arise when studying any subject, some- 
times fascinating questions that may be difficult to answer precisely. Number theory 
is the study of the most basic properties of the integers, literally taking integers 
apart to see how they are built, and there we find an internal beauty and coherence 
that encourages many of us to seek to understand more. Facts are often revealed by 
calculations, and then researchers seek proofs. Sometimes the proofs themselves, 
even more than the theorems they prove, have an elegance that is beguiling and 
reveal that there is so much more to understand. With good reason, Gauss called 
number theory the “Queen of Mathematics”, ever mysterious, but nonetheless gra- 
ciously sharing with those that find themselves interested. In this first course there 
is much that is accessible, while at the same time natural, easily framed, questions 
arise which remain open, stumping the brightest minds. 


Once celebrated as one of the more abstract subjects in mathematics, today 
there are scores of applications of number theory in the real world, particularly to 
the theory and practice of computer algorithms. Best known is the use of number 
theory in designing cryptographic protocols (as discussed in chapter 10), hiding 
our secrets behind the seeming difficulty of factoring large numbers which only 
have large prime factors. 


For some students, studying number theory is a life-changing experience: They 
find themselves excited to go on to penetrate more deeply, or perhaps to pursue 
some of the fascinating applications of the subject. 


Why give proofs? We give proofs to convince ourselves and others that our 
reasoning is correct. Starting from agreed upon truths, we try to derive a further 
truth, being explicit and precise about each step of our reasoning. A proof must 
be readable by people besides the author. It is a way of communicating ideas and 
needs to be persuasive, not just to the writer but also to a mathematically literate 
person who cannot obtain further clarification from the writer on any point that is 
unclear. It is not enough that the writer believes it; it must be clear to others. The 
burden of proof lies with the author. 


The word “proof” can mean different things in different disciplines. In some 
disciplines a “proof” can be several different examples that justify a stated hypoth- 
esis, but this is inadequate in mathematics: One can have a thousand examples that 
work as predicted by the hypothesis, but the thousand and first might contradict 
it. Therefore to “prove” a theorem, one must build an incontrovertible argument 
up from first principles, so that the statement must be true in every case, assuming 
that those first principles are true. 


Occasionally we give more than one proof of an important theorem, to highlight 
how inevitably the subject develops, as well as to give the instructor different 
options for how to present the material. (Few students will benefit from seeing 
all of the proofs on their first time encountering this material.) 


Motivation. Challenging mathematics courses, such as point-set topology, al- 
gebraic topology, measure theory, differential geometry, and so on, tend to be dom- 
inated at first by formal language and requirements. Little is given by way of 
motivation. Sometimes these courses are presented as a prerequisite for topics that 
will come later. There is little or no attempt to explain what all this theory is good 


Preface xix 


for or why it was developed in the first place. Students are expected to subject 
themselves to the course, motivated primarily by trust. 


How boring! Mathematics surely should not be developed only for those few 
who already know that they wish to specialize and have a high tolerance for bore- 
dom. We should help our students to appreciate and cherish the beauty of math- 
ematics. Surely courses should be motivated by a series of interesting questions. 
The right questions will highlight the benefits of an abstract framework, so that 
the student will wish to explore even the most rarified paths herself, as the bene- 
fits become obvious. Number theory does not require much in the way of formal 
prerequisites, and there are easy ways to justify most of its abstraction. 


In this book, we hope to capture the attention and enthusiasm of the reader 
with the right questions, guiding her as she embarks for the first time on this 
fascinating journey. 


Student expectations. For some students, number theory is their first course 
that formulates abstract statements of theorems, which can take them outside of 
their “comfort zone”. This can be quite a challenge, especially as high school 
pedagogy moves increasingly to training students to learn and use sophisticated 
techniques, rather than appreciate how those techniques arose. We believe that 
one can best use (and adapt) methods if one fully appreciates their genesis, so 
we make no apologies for this feature of the elementary number theory course. 
However this means that some students will be forced to adjust their personal 
expectations. Future teachers sometimes ask why they need to learn material, 
and take a perspective, so far beyond what they will be expected to teach in high 
school. There are many answers to this question; one is that, in the long term, the 
material in high school will be more fulfilling if one can see its long-term purpose. A 
second response is that every teacher will be confronted by students who are bored 
with their high school course and desperately seeking harder intellectual challenges 
(whether they realize it themselves or not); the first few chapters of this book should 
provide the kind of intellectual stimulation those students need. 


Exercises. Throughout the book, there are a lot of problems to be solved. Easy 
questions, moderate questions, hard questions, exceptionally difficult questions. No 
one should do them all. The idea of having so many problems is to give the teacher 
options that are suitable for the students’ backgrounds: 


An unusual feature of the book is that exercises appear embedded in the text [| 
This is done to enable the student to complete the proofs of theorems as one goes 
along} This does not require the students to come up with new ideas but rather to 
follow the arguments given so as to fill in the gaps. For less experienced students it 
helps to write out the solutions to these exercises; more experienced students might 
just satisfy themselves that they can provide an appropriate proof. 


Though they can be downloaded, as a separate list, from www.ams.org/granville-number-theory. 
*Often students have little experience with proofs and struggle with the level of sophistication 
required, at least without adequate guidance. 


XX Preface 


Other questions work through examples. There are more challenging exercises 
throughout, indicated by the symbol ? next to the question numbers, in which the 
student will need to independently bring together several of the ideas that have been 
discussed. Then there are some really tough questions, indicated by the symbol -, 
in which the student will need to be creative, perhaps even providing ideas not 
given, or hinted at, in the text. 


A few questions in this book are open-ended, some even phrased a little mis- 
leadingly. The student who tries to develop those themes her- or himself, might 
embark upon a rewarding voyage of discovery. Once, after I had set the exercises 
in section [9.2] for homework, some students complained how unfair they felt these 
questions were but were silenced by another student who announced that it was so 
much fun for him to work out the answers that he now knew what he wanted to do 
with his life! 


At the end of the book we give hints for many of the exercises, especially those 
that form part of a proof. 


Special features of our syllabus. Number theory sometimes serves as an intro- 
duction to “proof techniques”. We give many exercises to practice those techniques, 
but to make it less boring, we do so while developing certain themes as the book 
progresses, for examples, the theory of recurrence sequences, and properties of bi- 
nomial coefficients. We dedicate a preliminary chapter to induction and use it to 
develop the theory of sums of powers. Here is a list of the main supplementary 
themes which appear in the book: 


Special numbers: Bernoulli numbers; binomial coefficients and Pascal’s triangle; 
Fermat and Mersenne numbers; and the Fibonacci sequence and general second- 
order linear recurrences. 


Subjects in their own right: Algebraic numbers, integers, and units; compu- 
tation and running times: Continued fractions; dynamics; groups, especially of 
matrices; factoring methods and primality testing; ideals; irrationals and transcen- 
dentals; and rings and fields. 


Formulas for cyclotomic polynomials, Dirichlet L-functions, the Riemann zeta- 
function, and sums of powers of integers. 


Interesting issues: Lifting solutions; polynomial properties; resultants and dis- 
criminants; roots of polynomials, constructibility and pre-Galois theory; square 
roots (mod n); and tests for divisibility. 


Fun and famous problems like the abc-conjecture, Catalan’s conjecture, Egyp- 
tian fractions, Fermat’s Last Theorem, the Frobenius postage stamp problem, magic 
squares, primes in arithmetic progressions, tiling with rectangles and with circles. 


Our most unconventional choice is to give a version of Rousseau’s proof of the 
law of quadratic reciprocity, which is directly motivated by Gauss’s proof of Wil- 
son’s Theorem. This proof avoids Gauss’s Lemma so is a lot easier for a beginning 
student than Eisenstein’s elegant proof (which we give in section [8.10] of appendix 
8A). Gauss’s original proof of quadratic reciprocity is more motivated by the in- 
troductory material, although a bit more complicated than these other two proofs. 


Preface XXi 


We include Gauss’s original proof in section of appendix 8C, and we also un- 
derstand (2/n) in his way, in the basic course, to interest the reader. We present 
several other proofs, including a particularly elegant proof using Gauss sums in 


section [14.7] 


Further exploration of number theory. There is a tremendous leap in the level 
of mathematical knowledge required to take graduate courses in number theory, 
because curricula expect the student to have taken (and appreciated) several other 
relevant courses. This is a shame since there is so much beautiful advanced material 
that is easily accessible after finishing an introductory course. Moreover, it can be 
easier to study other courses, if one already understands their importance, rather 
than taking it on trust. Thus this book, Number Theory Revealed, is designed 
to lead to two subsequent books, which develop the two main thrusts of number 
theory research: 


In The distribution of primes: Analytic number theory revealed, we will discuss 
how number theorists have sought to develop the themes of chapter 5 (as well as 
chapters 4 and 13). In particular we prove the prime number theorem, based 
on the extraordinary ideas of Riemann. This proof rests heavily on certain ideas 
from complex analysis, which we will outline in a way that is relevant for a good 
understanding of the proofs. 


In Rational points on curves: Arithmetic geometry revealed, we look at solu- 
tions to Diophantine equations, especially those of degree two and three, extending 
the ideas of chapter 12 (as well as chapters 14 and 17). In particular we will prove 
Mordell’s Theorem (developed here in special cases in chapter 17) and gain a basic 
understanding of modular forms, outlining some of the main steps in Wiles’s proof 
of Fermat’s Last Theorem. We avoid a deep understanding of algebraic geometry, 
instead proceeding by more elementary techniques and a little complex analysis 
(which we explain). 


References. There is a list of great number theory books at the end of our 
book and references that are recommended for further reading at the end of many 
chapters and appendices. Unlike most textbooks, I have chosen to not include a 
reference to every result stated, nor necessarily to most relevant articles, but rather 
focus on a smaller number that might be accessible to the reader. Moreover, many 
readers are used_to searching online for keywords; this works well for many themes 
in mathematics) However the student researching online should be warned that 
Wikipedia articles are often out of date, sometimes misleading, and too often poorly 
written. It is best to try to find relevant articles published in expository research 
journals, such as the American Mathematical M. onthlyl4 or posted at arxiv.org which 
is “open access”, to supplement the course material. 


The cover (designed by Marci Babineau and the author). 


In 1675, Isaac Newton explained his extraordinary breakthroughs in physics and 
mathematics by claiming, “Jf I have seen further it is by standing on the shoulders 


5 Though getting just the phrasing to find the right level of article can be challenging. 
® Although this is behind a paywall, it can be accessed, like many journals, by logging on from most 
universities, which have paid subscriptions for their students and faculty. 


xxii Preface 


of Giants.” Science has always developed this way, no more so than in the theory 
of numbers. Our cover represents five giants of number theory, in a fan of cards, 
each of whose work built upon the previous luminaries. 


Modern number theory was born from PIERRE DE FERMAT’s readings of the 
ancient Greek texts (as discussed in section [6.1) in the mid-17th century, and his 
enunciation of various results including his tantalizingly difficult to prove “Last 
Theorem.” His “Little Theorem” (chapter 7) and his understanding of sums of two 
squares (chapter 9) are part of the basis of the subject. 


The first modern number theory book, Gauss’s Disquisitiones Arithmeticae, on 
which this book is based, was written by CARL FRIEDRICH GAUSS at the beginning 
of the 19th century. As a teenager, Gauss rethought many of the key ideas in number 
theory, especially the law of quadratic reciprocity (chapter 8) and the theory of 
binary quadratic forms (chapter 12), as well as inspiring our understanding of the 
distribution of primes (chapter 5). 


Gauss’s contemporary SOPHIE GERMAIN made perhaps the first great effort to 
attack Fermat’s Last Theorem (her effort is discussed in appendix 7F'). Developing 
her work inspired my own first research efforts. 


SRINIVASA RAMANUJAN, born in poverty in India at the end of the 19th cen- 
tury, was the most talented untrained mathematician in history, producing some 
extraordinary results before dying at the age of 32. He was unable to satisfactorily 
explain many of his extraordinary insights which penetrated difficult subjects far 
beyond the more conventional approaches. (See appendix 12F and chapters 13, 15, 
and 17.) Some of his identities are still inspiring major developments today in both 
mathematics and physics. 


ANDREW WILES sits atop our deck. His 1994 proof of Fermat’s Last Theorem 
built on the ideas of the previous four mentioned mathematicians and very many 
other “giants” besides. His great achievement is a testament to the success of 
science building on solid grounds. 


Thanks. I would like to thank the many inspiring mathematicians who have 
helped me shape my view of elementary number theory, most particularly Bela 
Bollobas, Paul Erdés, D. H. Lehmer, James Maynard, Ken Ono, Paulo Riben- 
boim, Carl Pomerance, John Selfridge, Dan Shanks, and Hugh C. Williams as well 
as those people who have participated in developing the relatively new subject of 
“additive combinatorics” (see sections [15.3) [15.4] and [15.6). Several peo- 
ple have shared insights or new works that have made their way into this book: 
Stephanie Chan, Leo Goldmakher, Richard Hill, Alex Kontorovich, Jennifer Park, 
and Richard Pinch. The six anonymous reviewers added some missing perspec- 
tives and Olga Balkanova, Stephanie Chan, Patrick Da Silva, Tristan Freiberg, 
Ben Green, Mariah Hamel, Jorge Jimenez, Nikoleta Kalaydzhieva, Dimitris Kouk- 
oulopoulos, Youness Lamzouri, Jennifer Park, Sam Porritt, Ethan Smith, Anitha 
Srinivasan, Paul Voutier, and Max Wenqiang Xu kindly read subsections of the 
near-final draft, making valuable comments. 


Gauss’s Disquisitiones 
Arithmeticae 


In July 1801, Carl Friedrich Gauss published Disquisitiones Arithmeticae, a book 
on number theory, written in Latin. It had taken five years to write but was im- 
mediately recognized as a great work, both for the new ideas and its accessible 
presentation. Gauss was then widely considered to be the world’s leading mathe- 
matician, and today we rate him as one of the three greatest in history, alongside 
Archimedes and Sir Isaac Newton. 


The first four chapters of Disquisitiones Arithmeticae consist of essentially the 
same topics as our course today (with suitable modifications for advances made in 
the last two hundred years). His presentation of ideas is largely the model upon 
which modern mathematical writing is based. There follow several chapters on qua- 
dratic forms and then on the rudiments of what we would call Galois theory today, 
most importantly the constructibility of regular polygons. Finally, the publisher 
felt that the book was long enough, and several further chapters did not appear in 
the book (though Dedekind published Gauss’s disorganized notes, in German, after 
Gauss’s death). 


One cannot overestimate the importance of Disquisitiones to the development 
of 19th-century mathematics. It led, besides many other things, to Dirichlet’s 
formulation of ideals (see sections 3.19] [3.20] of appendix 3D, [12.8)of appendix 12A, 
and of appendix 12B), and the exploration of the geometry of the upper 
half-plane (see Theorem [1.2] and the subsequent discussion). 


As a young man, Dirichlet took his copy of Disquisitiones with him wherever 
he went. He even slept with it under his pillow. As an old man, it was his most 
prized possession even though it was in tatters. It was translated into French in 
1807, German in 1889, Russian in 1959, English only in 1965, Spanish and Japanese 
in 1995, and Catalan in 1996! 


XXHi 


XXIV Gauss’s Disquisitiones Arithmeticae 


Disquisitiones is no longer read by many people. The notation is difficult. The 
assumptions about what the reader knows do not fit today’s reader (for example, 
neither linear algebra nor group theory had been formulated by the time Gauss 
wrote his book, although Disquisitiones would provide some of the motivation for 
developing those subjects). Yet, many of Gauss’s proofs are inspiring, and some 
have been lost to today’s literature. Moreover, although the more advanced two- 
thirds of Disquisitiones focus on binary quadratic forms and have led to many of 
today’s developments, there are several themes there that are not central to today’s 
research. In the fourth book in our trilogy (!), Gauss’s Disquisitiones Arithmeticae 
revealed, we present a reworking of Gauss’s classic, rewriting it in modern notation, 
in a style more accessible to the modern reader. We also give the first English 
version of the missing chapters, which include several surprises. 


Notation 


N — The natural numbers, 1, 2,3,.... 

Z — The integers, ...,—3, —2, -1,0,1,2,3,.... 
Throughout, all variables are taken to be integers, unless otherwise specified. 
Usually p, and sometimes q, will denote prime numbers. 

Q - The rational numbers, that is, the fractions a/b with a € Z and bEN. 

R — The real numbers. 


C — The complex numbers. 


y summand and Il summand 
Some variables: Some variables: 
Certain conditions hold Certain conditions hold 


mean that we sum, or product, the summand over the integer values of some vari- 
able, satisfying certain conditions. 

Brackets and parentheses: There are all sorts of brackets and parentheses in math- 
ematics. It is helpful to have protocols with them that take on meaning, so we do 
not have to repeat ourselves too often, as we will see in the notation below. But we 
also use them in equations; usually we surround an expression with “(” and “)” to 
be clear where the expression begins and ends. If too many of these are used in one 
line, then we might use different sizes or even “{” and “}” instead. If the brackets 
have a particular meaning, then the reader will be expected to discern that from 
the context. 


A{x] — The set of polynomials with coefficients from the set A, that is, f(x) = 
an a,x’ where each a; € A. Mostly we work with A = Z. 


A(x) —The set of rational functions with coefficients from the set A, in other words, 
functions f(x)/g(x) where f(x), g(x) € Ala] and g(x) £0. 


[t) — The integer part of t, that is, the largest integer < t. 


XXV 


XXV1 Notation 


{t} — The fractional part of (real number) t, that is, {t} = t — [t]. Notice that 
O<{t}<1. 

(a,b) — The greatest common divisor of a and 6. 

(a, b] — The least common multiple of a and b. 

b|a — Means b divides a. 

p*||a — Means p* divides a, but not p**! ( 
the “exact power” of p dividing a. 

I(a,b) — The set {am + bn: m,n € Z}, which is called the ideal generated by a 
and b over Z. 


where p is prime). In other words, k is 


log — The logarithm in base e, the natural logarithm, which is often denoted by 
“In” in earlier courses. 

Parity — The parity of an integer is either even (if it is divisible by 2) or odd (if it 
is not divisible by 2). 


The language of mathematics 


“By a conjecture we mean a proposition that has not yet been proven but which is 
favored by some serious evidence. It may be a significant amount of computational 
evidence, or a body of theory and technique that has arisen in the attempt to settle 
the conjecture. 


An open question is a problem where the evidence is not very convincing one 
way or the other. 


A theorem, of course, is something that has been proved. There are important 
theorems, and there are unimportant (but perhaps curious) theorems. 


The distinction between open question and conjecture is, it is true, somewhat 
subjective, and different mathematicians may form different judgements concerning 
a particular problem. We trust that there will be no similar ambiguity concerning 
the theorems.” 


—— Dan Shanks [Sha85] p. 2] 
Today we might add to this a heuristic argument, in which we explore an open 


question with techniques that help give us a good idea of what to conjecture, even 
if those techniques are unlikely to lead to a formal proof. 


Prerequisites 


The reader should be familiar with the commonly used sets of numbers N, Z, and Q, 
as well as polynomials with integer coefficients, denoted by Z|]. Proofs will often 
use the principle of induction; that is, if S(n) is a given mathematical assertion, 
dependent on the integer n, then to prove that it is true for all n € N, we need only 
prove the following: 


e S(1) is true. 

e S(k) is true implies that S(k +1) is true, for all integers k > 1. 
The example that is usually given to highlight the principle of induction is the 
statement “1+2+3+---4+n= aint)» which we denote by S(n)H] For n = 1 we 


check that 1 = +2 and so S$(1) is true. For any k > 1, we assume that S(k) is true 
and then deduce that 


14+24+34-:-+(k4+1) = (14+243+4+---+k) + (F4+1) 
—$_—_— 


= lle -- (k+1) as S(k) is true 


2 
(K+ 1)(K+2) | 
9 s] 


that is, S(k +1) is true. Hence, by the principle of induction, we deduce that S(n) 
is true for all integers n > 1. 


To highlight the technique of induction with more examples, we develop the 
theory of sums of powers of integers (for example, we prove a statement which 
gives a formula for 1? + 2? +---+ n? for each integer n > 1) in section 0.1 and 
give formulas for the values of the terms of recurrence sequences (like the Fibonacci 
numbers) in section 0.2. 


1There are other, easier, proofs of this assertion, but induction will be the only viable technique 
to prove some of the more difficult theorems in the course, which is why we highlight the technique here. 


XXVIi 


Xxvili Prerequisites 


Induction and the least counterexample: Induction can be slightly disguised. For 
example, sometimes one proves that a statement T(n) is true for all n > 1, by 
supposing that it is false for some n and looking for a contradiction. If T(n) is 
false for some n, then there must be a least integer m for which T(m) is false. The 
trick is to use the assumption that T(m) is false to prove that there exists some 
smaller integer k, 1 < k < m, for which T(k) is also false. This contradicts the 
minimality of m, and therefore T'(n) must be true for all n > 1. Such proofs are 
easily reformulated into an induction proof: 


Let S(n) be the statement that T(1),7T(2),...,T(m) all hold. The induction 
proof then works for if S(m — 1) is true, but S(m) is false, then T(m) is false and 
so, by the previous paragraph, T(k) is false for some integer k, 1 < k < m—1, 
which contradicts the assumption that S(m — 1) is true. 


A beautiful example is given by the statement, “Every integer > 1 has a prime 
divisor.” (A prime number is an integer > 1, such that the only positive integers 
that divide it are 1 and itself.) Let T(n) be the statement that n has a prime 
divisor, and let S(n) be the statement that T(2),7(3),...,7(m) all hold. Evidently 
S(2) = T(2) is true since 2 is prime. We suppose that S(k) is true (so that 
T(2),7(3),...,7(&) all hold). Now: 

Either k+1 is itself a prime number, in which case T(k+1) holds and therefore 
S(k +1) holds. 


Or k+1 is not prime, in which case it has a divisor d which is not equal to either 
lor k+1, and so2<d<k. But then S(d) holds by the induction hypothesis, 
and so there is some prime p, which divides d, and therefore divides k + 1. Hence 
T(k +1) holds and therefore S(k +1) holds. 


(The astute reader might ask whether certain “facts” that we have used here deserve 
a proof. For example, if a prime p divides d, and d divides k + 1, then p divides 
k}+1. We have also assumed the reader understands that when we write “d divides 
k +1” we mean that when we divide k + 1 by d, the remainder is zero. One of our 
goals at the beginning of the course is to make sure that everyone interprets these 
simple facts in the same way, by giving as clear definitions as possible and outlining 
useful, simple deductions from these definitions.) 


Chapter 0 


Preliminary Chapter 
on Induction 


Induction is an important proof technique in number theory. This preliminary 
chapter gives the reader the opportunity to practice its use, while learning about 
some intriguing number-theoretic concepts. 


0.1. Fibonacci numbers and other recurrence sequences 


The Fibonacci numbers, perhaps the most famous sequence of integers, begin with 


Fo =0, F, =1, Fh=1, Fs =2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233,.... 


The Fibonacci numbers appear in many places in mathematics and its applica- 
tions[}| They obey a rule giving each term of the Fibonacci sequence in terms of 
the recent history of the sequence: 


Fy, = Fy-1+ Fy 2 for all integers n > 2. 


We call this a recurrence relation. It is not difficult to find a formula for F;,: 


1 a nm _ n 
0.1.1 b= or all integers n > 0, 
- = 1 = ae 


where 14v5 and 1-v5 each satisfy the equation x+1 = x”. Having such an explicit 
formula for the Fibonacci numbers makes them easy to work with, but there is a 
problem. It is not obvious from this formula that every Fibonacci number _is an 
integer; however that does follow easily from the original recurrence relation? 


1 Typically when considering a biological process whose current state depends on its past, such as 
evolution, and brain development. 

Tt requires quite sophisticated ideas to decide whether a given complicated formula like is 
an integer or not. Learn more about this in appendix OF on symmetric polynomials. 


1 


2 Preliminary Chapter on Induction 


Exercise 0.1.1. (a) Use the recurrence relation for the Fibonacci numbers, and induction to 
prove that every Fibonacci number is an integer. 
(b) Prove that is correct by verifying that it holds for n = 0,1 and then, for all larger 
integers n, by induction. 


Exercise 0.1.2. Use induction on n > 1 to prove that 
(a) Fy + F3+--++ Fon-1 = Fon and 
(b) 1+ Fo t+ Fyt-+++ Fon = Fond. 


The number ¢ = 146 is called the golden ratio; one can show that F;, is the 


nearest integer to 6” /V5. 


Exercise 0.1.3. (a) Prove that ¢ satisfies 6? = +1. 
(b) Prove that ¢” = Fn¢@+ Fn-1 for all integers n > 1, by induction. 


Any sequence 2%, 21, %2,..., for which the terms z,, with n > 2, are defined 
by the equation 


(0.1.2) In = Ayn, + bry_2 for all n > 2, 


where a,b,29,21 are given, is called a second-order linear recurrence sequence. 
Although this is a vast generalization of the Fibonacci numbers one can still prove 
a formula for the general term, x, analogous to (0.1.1): We begin by factoring the 
polynomial 

a? —ax —b= (2 —a)(x— 8) 


for the appropriate a, 8 € C (we had a? -—2-—1= (x 14V5) (xp 1/5) for the 
Fibonacci numbers). If a 4 2, then there exist coefficients cy, cg for which 


(0.1.3) Ln = Cqa” +ceg8" for all n > 0. 


(In the case of the Fibonacci numbers, we have cg = 1/V5 and cg = —1/V5.) 
Moreover one can determine the values of cy and cg by solving the simultaneous 
equations obtained by evaluating the formula at n =0 and n = 1, that is, 


Ca +cg =X and cga+cgB = 2}. 


Exercise 0.1.4. (a) Prove (0.1.3) is correct by verifying that it holds for n = 0,1 (with wp and 
x1 as in the last displayed equation) and then by induction for n > 2. 


(b) Show that cq and cg are uniquely determined by xo and 21, provided a # B. 

(c) Show that if a 4 6 with zo = 0 and x; = 1, then rp = ae for all integers n > 0. 

(d) Show that if a 4 8 with yo = 2,y1 = a with yn = ayn—1 + byn—2 for all n > 2, then 
yn =a” 4+ B” for all integers n > 0. 


The {2n}n>o in (c) is a Lucas sequence, and the {yn}n>0 in (d) its companion sequence 


Exercise 0.1.53) (a) Prove that a = @ if and only if a? + 4b =0. 
(b)* Show that if a? + 4b = 0, then a = a/2 and x, = (cn + d)a” for all integers n > 0, for 
some constants c and d. 
(c) Deduce that if a = 8 with a9 = 0 and x; = 1, then z, = na”! for all n > 0. 


Exercise 0.1.6. Prove that if 29 = 0 and x; = 1, if (0.1.2) holds, and if a is a root of x? —ax—b, 
then a” = ary + ban—1 for alln > 1. 


3In this question, and from here on, induction should be used at the reader’s discretion. 


0.2. Formulas for sums of powers of integers 3 


0.2. Formulas for sums of powers of integers 


When Gauss was ten years old, his mathematics teacher aimed to keep his class 
quiet by asking them to add together the integers from 1 to 100. Gauss did this in 
a few moments, by noting if one adds that list of numbers to itself, but with the 
second list in reverse order, then one has 


14+ 100 =2+99=3+98=---=99+2=100+1=101. 


That is, twice the asked-for sum equals 100 times 101, and so 
1 
14+2+---4+100 = 5X 100 x 101. 


This argument generalizes to adding up the natural numbers less than any given 
N, yielding the formuld4] 


N-1 

(N—1)N 
0.2.1 n= —. 
(0.2.1) d 5 
The sum on the left-hand side of this equation varies in length with N, whereas 
the right-hand side does not. The right-hand side is a formula whose value varies 
but has a relatively simple structure, so we call it a closed form expression. (In the 
prerequisite section, we gave a less interesting proof of this formula, by induction.) 


Exercise 0.2.1. (a) Prove that 1+3+5+4+--+.-+(2N—1)=N? for all N > 1 by induction. 
(b) Prove the formula in part (a) by the young Gauss’s method. 
(c) Start with a single dot, thought of as a 1-by-1 array of dots, and extend it to a 2-by-2 array 
of dots by adding an appropriate row and column. You have added 3 dots to the original 
dot and so 1+3=2?. 


° O° O° 


1+ 3+ 5 +--. 


In general, draw an N-by-N array of dots, and add an additional row and column of dots 
to obtain an (N + 1)-by-(N +1) array of dots. By determining how many dots were added 
to the number of dots that were already in the array, deduce the formula in (a). 


Let S = nar, n?. Using exercise [0.2.1] we can write each square, n?, as the 
sum of the odd positive integers < 2n. Therefore 2m — 1 appears N — m times in 
the sum for S, and so 


N-1 N-1 N-1 
S= S>(Q2m-1)(N-m)=-N ¥> 1+ (2N+1) 5) m-25. 
m=1 m=1 m=1 


4This same idea appears in the work of Archimedes, from the third century B.C. in ancient Greece. 


4 Preliminary Chapter on Induction 


Using our closed formula for 5°,,,m, we deduce, after some rearrangement, that 


ag IN SNOW =i 
a ( BE ) 


a closed formula for the sum of the squares up to a given point. There is also a 
closed formula for the sum of the cubes: 


(0.2.2) 3 n= =. 
N-1 


This is the square of the closed formula (0.2.1) that we obtained for }>)_, n. Is 
this a coincidence or the first hint of some surprising connection? 


Exercise 0.2.2. Prove these last two formulas by induction. 


These three examples suggest that there are closed formulas for the sums of the 
kth powers of the integers, for every k > 1, but it is difficult to guess exactly what 
those formulas might look like. Moreover, to hope to prove a formula by induction, 
we need to have the formula at hand. 


We will next find a closed formula in a simpler but related question and use this 
to find a closed formula for the sums of the kth powers of the integers in appendix 
OA. We will go on to investigate, in section [7.34] of appendix 7I, whether there are 
other amazing identities for sums of different powers, like 

N-1 N-1 \? 
y= (Se). 
n=1 n=1 
0.3. The binomial theorem, Pascal’s triangle, and the binomial 
coefficients 


The binomial coefficient (") is defined to be the number of different ways of choos- 
ing m objects from n. (Therefore (") = 0 whenever m < 0 or m > n.) From this 
definition we see that the binomial coefficients are all integers. To determine (>) we 
note that there are 5 choices for the first object and 4 for the second, but then we 


have counted each pair of objects twice (since we can select them in either order), 


and so (3) = axed It is arguably nicer to write 5 x 4 as et = 2 so that 
(3) = ar One can develop this proof to show that, for any integers 0 < m <n, 
one has the very neat formuld)| 
n n! 
= l=r. = ee 
(0.3.1) (") > ier where r! =r-(r—1)---2-1. 


From this formula alone it is not obvious that the binomial coefficients are integers. 


Exercise 0.3.1. (a) Prove that ne = (")+(,,” ,) for all integers m, and all integers n > 0. 


™m. m—1 
n\: ‘ 
(b) Deduce from (a) that each (”) is an integer. 
° We prefer to work with the closed formula 27!/(15!12!) rather than to evaluate it as 17383860, since 


the three factorials are easier to appreciate and to manipulate in subsequent calculations, particularly 
when looking for patterns. 


0.3. The binomial theorem, Pascal’s triangle, and the binomial coefficients 5 


Pascal’s triangle is a triangular array in which the (n + 1)st row contains the 
binomial coefficients (h, with m increasing from 0 to n, as one goes from left to 
right: 


15101051 
1615 20 1561 
. etc. 


The addition formula in exercise |Q.3.I{a) yields a rule for obtaining a row from the 
previous one, by adding any two neighboring entries to give the entry immediately 
below. For example the third entry in the bottom row is immediately below 5 and 
10 (to either side) and so equals 5 + 10 = 15. The next entry is 10 + 10 = 20, etc. 


The binomial theorem states that if n is an integer > 1, then 


w+ =3 (“emer 


m=0 


Exercise 0.3.2.1 Using exercise a) and induction on n > 1, prove the binomial theorem. 


Notice that one can read off the coefficients of (a + y)” from the (n+ 1)st row 
of Pascal’s triangle; for example, reading off the bottom row above (which is the 
7th row down of Pascal’s triangle), we obtain 


(a + y)® = 28 + 6x? y + 15aty? + 2023 y? + 152244 + Gay? + y®. 


In the previous section we raised the question of finding a closed formula for 
the sum of n”, over all positive integers n < N. We can make headway in a related 
question in which we replace n* with a different polynomial in n of degree k, namely 
the binomial coefficient 


(") n(n — Oe k +1) 


This is a polynomial of degree k in n. For example, we have C) = ne a a +3,8 


polynomial in n of degree 3. We can identify a closed formula for the sum of these 
binomial coefficients, over all positive integers n < N, namely: 


(0.3.2) ¥ (;) = Ge 


n= 


6 Preliminary Chapter on Induction 


for all N and k > 0. For k = 2, N =6, this can be seen in the following diagram: 


so that 1+3+6+ 10 equals 20. 


Exercise 0.3.3. Prove (0.3.2) for each fixed k > 1, for each N > k +1, using induction and 
exercise You might also try to prove it by induction using the idea behind the illustration 
in the last diagram. 


If we instead display Pascal’s triangle by lining up the initial 1’s and then 
summing the diagonals, 


10 


1 
1 1 
1 2 1 
1 3 3 
1 4 ©) 1 
1 G) 10 
G) 6 15 
the sums are 1, 1,14+1,14+2,14341,14+4+43,14+5+4+6+1,... which equal 
1, 1, 2, 3, 5, 8, 18,..., the Fibonacci numbers. It therefore seems likely that 


aes *) for all > 1 


Exercise 0.3.4. Prove (0.3.3) for each integer n > 1, by induction using exercise [0.3.1[a). 


(0.3.3) 


Mi 


k=0 


Articles with further thoughts on factorials and binomial coefficients 


[1] Manjul Bhargava, The factorial function and generalizations, Amer. Math. Monthly 107 (2000), 
783-799 (preprint). 


[2] John J. Watkins, chapter 5 of Number theory. A historical approach, Princeton University Press, 
2014. 
Additional exercises 
Exercise 0.4.1. (a) Prove that for all n > 1 we have 
1 1\"_ (Foti Fh 
1 0 ~ Fn Fn-i ; 
(b) Deduce that Fn41Fn—1 — F? = (—-1)” for all n> 1. 
(c) Deduce that Fe 5 — Fy4iFn — F? = (-1)" for all n> 0. 


Induction applied to questions about recurrence sequences 7 


Exercise 0.4.2.1 Deduce from (0.1.1) that the Fibonacci number Fy, is the nearest integer to 


o¢"/\/5, for each integer n > 0, where the constant ¢ := Aaa This golden ratio appears in art 
and architecture when attempting to describe “perfect proportions”. 


Exercise 0.4.3. Prove that F? 4 Beis = QF? 4 | F?,5) for all n > 0. 


n 


Exercise 0.4.4. Prove that for all n > 1 we have 
Fon-1 = F2_, + FF? and Fon = F241 — Fe_y. 


Exercise 0.4.5. Use (0.1.1) to prove the following: 
(a) For every r we have F? — Fn4,Fr—r = (—1)"~"F? for alln>r. 
(b) For all m > n> 0 we have Fy, Fr41 — Fm4iFn = (-1)"Fm-n- 


Exercise 0.4.6. Let uo = b and un+1 = aun for all n > 0. Give a formula for all un with n > 0. 


The expression 011010 is a string of 0’s and 1’s. There are 2” strings of 0’s and 
1’s of length n as there are two possibilities for each entry. Let A, be the set of 
strings of 0’s and 1’s of length n which contain no two consecutive 1’s. Our example 
011010 does not belong to Ag as the second and third characters are consecutive 
1’s, whereas 01001010 is in Ag. Calculations reveal that |Ai| = 2, |A2| = 3, and 
|As| = 5, data which suggests that perhaps |A,| = Fy+2, the Fibonacci number. 


Exercise 0.4.7.1 (a) If Ow is a string of 0’s and 1’s of length n, prove that Ow € Ap if and 
only if w € An-1. 
(b) If 10w is a string of 0’s and 1’s of length n, prove that 10w € Ap if and only if w € An_2. 
(c) Prove that |An| = Fn+2 for all n > 1, by induction on n. 


Exercise 0.4.8.1 Prove that every positive integer other than the powers of 2 can be written as 
the sum of two or more consecutive integers. 


Exercise 0.4.9. Prove that (3) as (*) (") for any integers n >a>m> 0. 


a-—™m 


Exercise 0.4.10.' Suppose that a and b are integers and {xp : n > 0} is the second-order linear 
recurrence sequence given by (0.1.2) with a = 0 and x; = 1. 
(a) Prove that for all non-negative integers m we have 


Imtk = lm+41L_ + bX&mxL,~1 for all integers k > 1. 
(b) Deduce that 


ton41 = way + ba? and an = %n41%n+banzep,~-1 for all natural numbers n. 


Exercise 0.4.11. Suppose that the sequences {a : n > O} and {yn : n > 0} both satisfy (0.1.2) 
and that x9 = 0 and 21 = 1, whereas yo and yi might be anything. Prove that 
Yn = Y1itn + byoxn—1 for all n > 1. 
Exercise 0.4.12. Suppose that x9 = 0, v1 = 1, and &n+42 = a%yn41 + bry. Prove that for all 
n> 1 we have 
(a) (a+ b—1) 30%) ej = angi t+ ben — 1; 
(b) a(b"a2 + b°—la? +--+ 4 bx? _, +22) = antn415 


(c) 22 —an—1%n41 = (—b)"-1. 


Exercise 0.4.13. Suppose that tn+2 = a%n+1+ brn for all n > 0. 


(a) Show that 
n 
Sass Tat) fe 8 Ha. for alln > 0. 
In41 In 1 0 1 «xO 
2 


(b) Deduce that tn42%n — ea = c(—b)” for all n > 0 where ¢ := 2a — v7. 


(c) Deduce that x2, — aan41an — bx? = —c(—b)”. 


8 Preliminary Chapter on Induction 


Other number-theoretic sequences can be obtained from linear recurrences or 
other types of recurrences. Besides the Fibonacci numbers, there is another se- 
quence of integers that is traditionally denoted by (Fr,)n>0: These are the Fermat 
numbers, F,, = 2?" +1 for all n > 0 (see sections B.11] of appendix 3A, [5.1] 5.25 of 
appendix 5H, etc.). 


Exercise 0.4.14. Show that if Fo = 3 and Fr41 = F? — 2F, + 2, then Fyn = 22” 41 for all 
n> 0. 


Exercise 0.4.15. (a) Show that if Mp = 0, Mi = 1, and Mni2 = 3Mn+41—2Mn for all integers 
n> 0, then My, = 2”—1 for all integers n > 0. The integer M,, is the nth Mersenne number 


(see exercise [2.5.16] and sections [4.2] [5-1] etc.). 
(b) Show that if Mo = 0 with My41 = 2M, +1 for all n > 0, then M, = 2" — 1. 


Exercise 0.4.16. We can reinterpret exercise [0.4.3] as giving a recurrence relation for the se- 
quence {F?}n>0; where Fy, is the nth Fibonacci number; that is, 

F2 4 =2F?,542F?,, — F? for alln>0. 
Here Fo. , is described in terms of the last three terms of the sequence; this is called a linear 


recurrence of order 3. Prove that for any integer k > 1, the sequence {F¥},>0 satisfies a linear 
recurrence of order k + 1. 


How to proceed through this book. It can be challenging to decide what 
proof technique to try on a given question. There is no simple guide—practice is 
what best helps decide how to proceed. Some students find Zeitz’s book 
helpful as it exhibits all of the important techniques in context. I like Conway and 
Guy’s |CG96| since it has lots of great questions, beautifully discussed with great 
illustrations, and introduces quite a few of the topics from this book. 


A paper that questions one’s assumptions is 


[1] Richard K. Guy The strong law of small numbers, Amer. Math. Monthly, 95 (1988), 697-712. 


Appendix OA. A closed 
formula for sums of powers 


In chapter 0, we discussed closed form expressions for sums of powers. We will 
prove here that there is such a formula for the sum of the kth power of the integers 
up to a given point, developing themes from earlier in this chapter. 


0.5. Formulas for sums of powers of integers, II 


Our goal in this section is to use our formula (0.3.2) for summing binomial coeffi- 
cients, to find a formula for summing powers of integers. For example, since 


n= 6(3) +6(3) +(3) 


we can use (0.3.2) with k = 3, 2, and 1, respectively, to obtain 


Ynr=od (4) +6d (S)+ 3 (") 


n=0 n=0 n=0 


-o(7) +6(3)+(2). 


Summing these three multiples of binomial coefficients gives the formula for the 
sum of the cubes of the integers up to N — 1, which we encountered in section 
[0.2] To make this same technique work to sum n*, for arbitrary integer k > 1, we 
need to show that 2* can be expressed as a sum of fixed multiples of the binomial 


coefficients (2s beg (5), where by (7) we mean the polynomial 
z\  a(x—1)---(a@—(k-—1)) 
k} k! ; 


Notice that if we substitute « = n into this expression, we obtain the binomial 
coefficient (7*). 


9 


10 Appendix 0A. A closed formula for sums of powers 


Proposition 0.5.1. Any polynomial f(x) € Z|x] of degree k > 0 can be written as 
a sum of integer multiples of the binomial coefficients (7),..., (4), eae 
Proof. By induction on k. The result is immediate for k = 0. Otherwise, suppose 
that f(a) has leading coefficient ax*; then subtract a-k!- @) which also has leading 
coefficient ax*. The resulting polynomial, g(a) = f(x) —a-k!- (7), has degree k —1 
so can be written as co(>) +--+ + ¢x—1(,",) by the induction hypothesis. But then 


f(x) = co(5) achat cr(;)s with c, = a-k!, as desired. 


In particular, there are integers co, C1,...,Ck for which 


(0.5.1) g* = (4) tee tbey (*) +c (5): 


One can then immediately deduce, from (0.3.2), that 
N-1 Naty N-1 / Nai y 
ke 
~ wad (p)t tad C)+od (6) 
= ra thsoeid oe 
SON et al meee 


Expanding out the binomial coefficients, this gives the desired closed form expres- 
sion for pam n*, a polynomial in N of degree k + 1. 

There is a difficulty. We proved that the c; exist but did not show how to 
determine them. We can do this by successively substituting in x = 0, then x = 1, 
then ...,2 = k—1 into (0.5.1), since one obtains 

OF = ce-O+---+c1:0469, 
and so cp = 0; then 
1® = cy-O+++-+e2:0+e1 +060, 
and so c; = 1; and then cy = 2" — 2, cz = 3* —3-2* +3, etc. We end this appendix 
with a particularly challenging exercise. 
Exercise 0.5.1.1 (a) Establish that holds with 
_ pk m k, (™ Reine ote oi m—2 
by SR (Tylon vr 4 (Hy lom 2) + (—1) ( 


for all m > 1 and for all k > 1. The integers cm/m! are the Stirling numbers of the second 
kind, usually denoted by S2(k,m). They arise in several interesting combinatorial settings; 
for example, S2(k,m) is the number of ways to partition a set of k objects into m non-empty 
subsets. 

(b) Deduce that, for any given integer k > 0, there exist rational numbers ag, a1,...,@%41 for 
which x n¥ =agp t+a,N+---+ ap41N*+1 for all integers N > 1. 


m k _4\ym-1 
aoe te 


Exercise 0.5.2. Prove that c;/j! is an integer for all 7 > 0 in (0.5.1). 


Exercise 0.5.3.1 Let f(x) € C[a2]. Prove that f(n) is an integer for all integers n if and only if 
f(z) = Ym am (7) where the am are all integers. 


We will return to this topic, finding an elegant description of the rational num- 
bers a; by introducing the Bernoulli numbers in the next appendix, appendix OB. 


Appendix 0B. Generating 
functions 


The generating function (or generating series) of a given sequence of numbers 
ao,@1,... is the power series 


2 
ao + a,x + aQx" +--- 


involving a variable x, where the nth term is a,x7”. We now see how generating 


functions allow us to provide alternative, elegant proofs of the results of this chapter. 
We begin with an alternative proof of (0.3.2) that exhibits the power of constructing 
generating functions. 


Exercise 0.6.1. (a) Prove that for every integer k > 0 one has 
1 k k+1 R42)» k+m 
——— t Pitas tM pee, 
(1—t)ktl og Ca) a +f k ) € 
(b) Prove that (0.3.1) follows by equating the coefficient of t’—*—1 on either side of 


1 1 1 
GQ—-p) et (1-8 (1— aR?" 


(c) Multiply this identity through by 1 — t¢ and reprove the formula in exercise 0.3.1.(a) by 
equating the coefficients on each side. 


0.6. Formulas for sums of powers of integers, III 
The Bernoulli numbers, B,,, are the coefficients in the power series: 


4 nm 
ex —1 = Ba 


n>0 


They are a sequence of numbers that occur in all sorts of interesting contexts in 


number theory. The first few Bernoulli numbers are Bp = 1, By = —t, Bo = z, 
Bz = 0, Ba = —3g, Bs = 0, Be = y, By = 0, Bg = —35, Bo = 0, Bio = &,.... 


12 Appendix OB. Generating functions 


From this data we can make a few guesses as to what they look like in general: 


e Ifn is odd and > 1, then B, = 0. This is easily proved since 


n=0 n>=0 n>0 
n odd n odd n odd 
wi, SIE 2 TO, 2. 28 
X_1 eX-1 ex—-1 e*X-1 ; 


Comparing the coefficients of X” on either side of this equation, we conclude 
that By, = -4 and B,, = 0 for all odd n > 3. 


e The B, are rational. We expand the power series 


e* —1 xX xX” xX* 
1= 2 ———— B,— 
xX eX —1 easy yy s! 


eco (r+1)!s!* | (n+)! 
r+s=n 


and compare the coefficients of X” on either side to obtain that Bp = 1 and 
ae (227) B, = 0 for each n > 1. This can be rewritten as 


s=0 s 


n—1 


1 n+1 
B= ( i" ) Ba for each n > 1 


We can then deduce by induction on n > 1 that the B, are rational, since we 
have given B,, as a finite sum of rational numbers times Bernoullli numbers 
B, with s <n. 


Next we define the Bernoulli polynomials, B,,(t), as the coefficients in the power 
series: 


xX tx xn 
==> Bw) 
n! 


and therefore B,(0) = B,. To verify that these are really polynomials, note that 
— 


S°B x* en, « x = m 
W(t) ar ee | >= BT = DD aaa 


k>0 : m>0 m>0n>0 


Here we change variable, writing k = m+n, and then the coefficient of X*/k! is 
k 


(0.6.1) B,(t) = So ~ BS (*) Been, 


m,n>0 n=0 


We have done all this preliminary work so as to prove the following extraordinary 
formula for the sum of the mth powers of the positive integers < N. 


Theorem 0.1. For any ee k>1andN>1 we have 


Yak 


By(N) — Br). 


cal 


0.7. The power series view on the Fibonacci numbers 13 


Proof. If N is an integer > 1, then 


Xe = M(eNK— 1) i 
S°(Be(N) - Br) a 2 esp Sve 
k>0 n=0 


| 
> 
M 
le 
Ss 
kas 
ll 
vM 
~~ 
wo 
3 
ane 
rs 
+ 


Therefore for any integer N > 1 we obtain 
Xk N-1 _ Xk 
Dima = Yo (e De) 
k>0 k>1 


by letting k = 7+ 1. The result follows by comparing the coefficients on both 
sides. 


Negative powers. A key quantity in number theory is the infinite sum 


1 1 1 

Ck) =1+ ae tant ge 

which we define for each integer k > 2. This is called the Riemann zeta-function 

even though it was first explored by Euler more than a hundred years earlier. Each 

of these sums is convergent, as each 1/m* < 1/m? for all m > 1,k > 2, and 
1/m? < f"" , dt/t®, so that 


ear a 


p 2. 


1 1 1 
il) nee Pave ea gs Be a] = a rea | 
Tgp tig t te Oat oe ag +f 


We will make a few observations about the values of these sums: 


Exercise 0.6.2. (a) Prove that }> 1. 


(b) Prove that 7,55 1 


me — m(m—1)* 


(c) Deduce that $7,59(¢(k) — 1) =1. 


1 = 
m2>2 m(m—1) — 


Exercise 0.6.3. Let P be the set of perfect powers > 1. Let N be the set of integers > 1 that 
are not perfect powers (so that P UN is a partition of the integers > 1). 

(a) Prove that P = {n*:n€N and k > 2} and {n®:neN andk>1}={m> 2}. 

(b) Prove that pep Po = Dr>2 SEN DV y>1 SK: 

(c) Deduce that pep po S11: 


This result was communicated by Goldbach to Euler in 1744. 


0.7. The power series view on the Fibonacci numbers 


An alternate view on Fibonacci numbers, and indeed all second-order linear recur- 
rence sequences, is via their generating functions. For Fibonacci numbers we study 
the generating function 
n 
) Pie", 


n>0 


14 Appendix OB. Generating functions 


which is a power series in «. Remembering that FP, — F,-1— Fn—-2 = 0 for all n > 2 
we then have 


(l-a—2’)S0 Pye” =F + (Fi—Fo)at+ (Fa - Fi — F)x” 


+++ (PF, — Fy-1 — Fr—2)a" +-°° 


om 
ct 
a 
oO 
=) 


se 14 V5 _ ie 
Hence if a = V5 and B= ee 


a x 1 Ox Ba 
ye => = 
1-2-2? a-B\l-ar 1-82 


n>0 


and the result (0.1.1) follows, again. Note that if x, = a%,_1 + ban_2, then 
(1 — at — bt?) S- Ent” = Xo + (a1 — axo)t. 
n>0 
The sequence {27 }n>o is again determined by the values of a, b, wo, and 2}. 


Exercise 0.7.1.1 Use this to deduce (0.1.3) when a? +4b # 0, and exercise[0.1.5{c) when a? +4b = 
0. 


Both of these methods generalize to arbitrary linear recurrences of degree n, as 
follows. 


Theorem 0.2. Suppose that a,,a9,...,a@q and 2, %1,...,%q_1 are given and that 
In = En, + A2%n_2 +--+ +agtn_a for alln > d. 


Factor the following polynomial into linear factors as 
X4 — ay Xo! = Gy Xt 4... = 0g 4X = ag = [[« — aj). 


Then there exist polynomials P,,...,P,, each P; of degree < e; —1, such that 
k 
(0.7.1) Li = S- Pj(n)aj for all n > 0. 
j=l 
The coefficients of the P; (and the polynomials P; themselves) can be determined 
by solving the linear equations obtained by taking this forn =0,1,2,...,d—1. 


Exercise 0.7.2.4 Prove that (7-1) holds. 


Exercise 0.7.3.! Let (an)n>0 be the sequence which begins zo = 0,21 = 1 and then zn = 
atn—1 + btn—2 for all n > 2. Its companion sequence, (Yn)n>0, begins yo = 2,y1 = £2 and then 
Yn = Ayn—1+byn—2 for alln > 2. For example, tz, = 2”—1 has companion sequence yn = 2"+1. 

(a) Prove that yn = a” + 8” for all n > 0 and also that yn = tan/@n. 

(b) Let zo = —1 and zn = —bzn_1 for all n > 1. Give an explicit formula for zn. 

(c) Prove that tm+on = Yn&m+n + 2Zn%m for all m,n > 0. 

(d) Deduce that Fn+6 = 4Fn+3 + Fn for all n > 0. 


Appendix OC. Finding roots 
of polynomials 


In the remaining appendices of this preliminary chapter (chapter 0) we introduce 
several important themes in number theory that do not often appear in a first course 
but will be of interest to some readers. We also take some time to introduce some 
basic notions of algebra that appear (sometimes in disguise) throughout this and 
subsequent number theory courses. To begin with we discuss the famous question 
of techniques for factoring polynomials into their linear factors. 


The reader knows that the roots of a quadratic polynomial ax? + br + c = 0 
are 


—btVA 

2a? 

is called the discriminant of our polynomial, ax? + br + c. The easy way to prove 

this is to put the equation into a form that is easy to solve: Divide through by a, 

to get x” + (b/a)x + c/a = 0, so that the leading coefficient is 1. Next make the 
change of variable, y = x + b/2a, to obtain 


where A := 6? — 4ac, 


y’ — A/4a? = 0. 


Having removed the y! term, we can simply take square roots to obtain the pos- 
sibilities, +/VA/2a, for y, and hence we obtain the possible values for x (since 
x = y— 6/2a). Can one similarly find the roots of a cubic? 


0.8. Solving the general cubic 


We can certainly begin solving cubics in the same way as we approached quadratics. 


Exercise 0.8.1. Show that the roots of any given cubic polynomial, Av? + Ba? + Ca + D with 
A #0, can be obtained from the roots of some cubic polynomial of the form x? + ax +b, by adding 
B/3A to each root. Moreover write a and b explicitly as functions of A, B, and C. 


16 Appendix OC. Finding roots of polynomials 


We wish to find the roots of x? + ax +b = 0 for arbitrary a and b (which then 
allows us to determine the roots of an arbitrary cubic polynomial, by exercise[0.8.1). 
This does not look so easy since we cannot simply take cube roots unless a = 0. 
Cardano’s trick (1545) is a little surprising: Write « = u+ vu so that 
e+art+b=(utv)?+a(ut+v)+b= (u? +0 +b) 4+ (ut v)(3uv +a). 
This equals 0 when 
u>+v°> =—b and 3uv = —a. 

These conditions imply the simultaneous equations 

vty? =—b and u?v? = —a"/27, 
so that, as a polynomial in X we have 

(X —u?)(X — v9) = X*+0X —a?/27. 
Using the formula for the roots of a quadratic polynomial yields 
3 3 —b+ o/b? + 403/27 —b+ 4/A/(—27) 
(0.8.1) wiv = = ; 
2 2 

where A := —4a® — 276? is the discriminant of our polynomial, x? + az + b. (The 


definition and some uses of discriminants are discussed in detail in section [2.11] of 
appendix 2B.) 


All real numbers have a unique real cube root, call it t, and then the other 
cube roots are wt and w*t, where w is a cube root of 1; for instance we may 
take w = e2!7/3 — anaes Therefore if U and V are the real cube roots in 
(0.8.1), so that —3UV is real and therefore equal to a, then the possible solutions 
to u3 + v? = —b together with 3uv = —a are 


(u,v) = (U,V), (WU,w?V), and (w?U,wV). 
This implies that the roots of 2? + ax + b are given by 
U+V, wl +w?V, and w?U +uV. 


The roots of a quadratic polynomial were obtained in terms of integers and 
square roots of integers. We have just seen that the roots of a cubic polynomial can 
be obtained in terms of integers, square roots, and finally cube roots. How about 
the roots of a quartic polynomial? Can these be found in terms of integers, square 
roots, cube roots, and fourth roots? And are there analogous expressions for the 
roots of quintics and higher degree polynomials? 


0.9. Solving the general quartic 


This is bound to be technically complicated, so much so that it is arguably more 
interesting to know that it can be done rather than actually doing it, so we just 
sketch the proofs: 

We begin, as above, by rewriting the equation in the form #*+ax?+bxr+c = 0. 
Following Ferrari (1550s) we add an extra variable y to obtain the equation 


(0.9.1) (a* +a+y)? = (at 2y)a* — br + ((a+y)? — c) 


0.10. Surds 17 


and then select y so as to make the right-hand side the square of a linear polynomial 
rx +s € C{z], in which case (x? +a+ y)? = (ra +)”, so that x is a root of one of 
the quadratic polynomials 


(x? +at+y)+(rz+8). 


A quadratic polynomial is the square of a linear polynomial in x if and only if its 
discriminant equals 0. The right side of (0.9.1) has discriminant 


b* — A(a + 2y)((a + y)” — ¢), 


a cubic polynomial in y. We can find the roots, y, of this cubic polynomial by the 
method explained in the previous section. Given these roots, we can determine the 
possible values of r and s, and then we can solve for x to find the roots of the 
original equation. 


Example. The roots of X4 + 4X° — 37X? — 100X + 300. Letting x = X +1 
yields 2+ — 43x? — 18x + 360. Proceeding as above leads to the cubic equation 
2y> — 215y? + 6676y — 64108 = 0. Dividing through by 2 and then changing 
variable y = t + 215/6 gives the cubic t? — (6169/12)t — (482147/108) = 0. This 
has discriminant —4(6169/12)? + 27(482147/108)? = —(2310)?. Hence u?,v? = 
(482147 + 27720./—3)/216. Unusually this has an exact cube root in terms of 
J/—3; that is, u,v = w*(—37 + 40/—3)/6, where w* denotes w to some power. 
Now —3(—37 + 40./—3)/6 - (—37 — 40.\/—3)/6 = —6169/12 = a. Therefore we 
can take u,v = (—37 + 40./—3)/6, and the roots of our cubic are t = ut+v = 
—37/3, wutw?v = 157/6, w2utwv = —83/6 so that y = 47/2, 62, 22. From these, 
Ferrari’s equation becomes (x? — 39/2) = +(22% + 9/2) for y = 47/2 and so the 
possible roots —5, 3; —4, 6; or (x? +19) = +(9x + 1) for y = 62 and so the possible 
roots —5, —4;3,6; or (x? — 21) = +(x +9) for y = 22 and so the possible roots 
—5,6;3,—4. For each such y we get the same roots x = 3, —4,—5,6, yielding the 
roots X = 2,—5, —6,5 of the original quartic. 


Example. A fun example is to find the fifth roots of unity, other than 1. That is 
those x satisfying “=! = 24 +23 +22+a+1=0. Proceeding as above we find 
the four roots 


V5 —-14V—2V5 — 10 V5 —-14V2V5—10 
Z ‘ 


4 


0.10. Surds 


A surd is a square root or a cube root or a higher root, that is, an nth root for 
some number n > 2. We have shown above that the roots of degree 2, 3, and 4 
polynomials can be determined by taking a combination of surds. We would like to 
show something similar for polynomials of degree 5 and higher, which is the focus 
of a course on Galois theory. 


Gauss’s favorite example of surds was the expression for cos at, which we denote 


by c(k). A double angle formula states that cos 20 = 2cos? 6 — 1, and so taking 
6 = 27/2" we have 


e(k — 1) = 2c(k)? — 1, 


18 Appendix OC. Finding roots of polynomials 


which may be rewritten as c(k) = 5/2 + 2c(k —1). Note that c(k) > 0 for k > 2 
and c(2) = 0. Hence 


o(8) = 5v2 od) = 524 V3, e(5) = 52+ 2+V72 


and so we deduce by induction that 


2 1 / 
cos ($F a5 yas yes Pee aD for each k > 3. 
ee  — 


k—2 times 


Why does expressing the roots of polynomials in terms of surds seem like a 
good idea? Are the roots, given explicitly as in the second example above, any 
more enlightening than simply saying that one has a root of the original equation? 
We can give arbitrarily good approximations to the value of any given irrational 
(and rather rapidly using the right software), so what is really the advantage of 
expressing the roots of polynomials in terms of surds? The answer is more to do with 
our comfort with certain concepts, and aesthetics, than any intrinsic notion. In the 
rather sophisticated Galois theory there are identifiable differences between these 
different types of expressions, but such concepts are best left to a more advanced 
course. 


One can learn much more about these beautiful classical themes by studying 


the first six chapters of |Tig16). 


References discussing solvability of polynomials 
[1] Raymond G. Ayoub, On the nonsolvability of the general polynomial, Amer. Math. Monthly 89 
(1982), 397-401. 


[2] Harold M. Edwards, The construction of solvable polynomials, Bull. Amer. Math. Soc. 46 (2009), 
no. 3, 397-411. 


[3] Blair K. Spearman and Kenneth S. Williams, Characterization of solvable quintics a +ax +b, 
Amer. Math. Monthly 101 (1994), 986-992. 


Appendix 0D. What is 
a group? 


Mathematical objects are often structured into groups. It is easiest to prove results 
for arbitrary groups, so that these results apply for all examples of groups that 
arise|’} Many of the main theorems about groups were first proved in a number 
theory context and then found to apply elsewhere. 


0.11. Examples and definitions 


The main examples of groups that you have encountered so far are additive groups 
such as the integers, the rationals, the complex numbers, polynomials of a given 
degree, and matrices of given dimensions; and multiplicative groups such as the 
non-zero rationals, the non-zero complex numbers, and invertible square matrices 
of given dimension. (The integers mod p, a notion we will introduce in chapter 2, 
also give rise to both an additive and a multiplicative group.) 


We will now give the definition of a group—keep in mind the objects named in 
the last paragraph and the usual operations of addition and multiplication: 


A group is defined to be a set of objects G and an operation, call it *, such 
that: 


(i) If a,b € G, then ax b € G. We say that G is closed under *. 

(ii) If a,b,c € G, then (ax b) xc = ax (b*c); that is, when multiplying three 
elements of G together it does not matter which pair we multiply first. We say that 
G is associative. 


(iii) There exists an element e € G such that for every a € G we have a*e = 
exa=a. We call e the identity element of G for x. (For a group in which “x” is 


®One can waste a lot of energy giving the same proof, with minor variations, in each situation 
where a group arises. Gauss wrote Disquisitiones before the abstract notion of a group was formulated 
and therefore does give very similar proofs in different places when dealing with different examples of 
groups. 


19 


20 Appendix 0D. What is a group? 


much like addition we typically denote the identity by 0; for a group in which “x” 
is much like multiplication we typically denote the identity by 1.) 


(iv) For every a € G there exists b € G such that axb =bxa=e. We call b 
the inverse of a. (For a group in which “s” is much like addition we write —a for 
the inverse of a; for a group in which “*” is much like multiplication we write a~! 
for the inverse of a.) 


One can check that the examples of groups given above satisfy these criteria. 
We have given examples of both finite and infinite groups. Notice that neither the 
integers nor the polynomials form multiplicative groups, because there is no inverse 
to 2 in the integers and no inverse to x amongst the polynomials. 


There is one familiar property of numbers and polynomials that is not used in 
the definition of a group, and that is that a*b = b*a, that a and b commute. 
Although this often holds, there are some simple counterexamples, for instance 
most pairs of 2-by-2 matrices do not commute: For example, 


(oC a= 9} meee (C8) (4 a) 


We develop the full theory for 2-by-2 matrices in the next section. If all pairs 
of elements of a group commute, then we call the group commutative or abelian. 
Typically we use multiplicative notation for groups that are non-commutative. It 
will be useful to develop a theory that works for non-commutative, as well as 
commutative, groups 


A given group G can contain other, usually smaller, groups H, which are called 
subgroups. Every group G contains the subgroup given by the identity element, {0} 
(the trivial subgroup), and also the subgroup G itself. It can also contain others; 
any subgroup other than G itself is a proper subgroup. For example the additive 
group of integers mod 6 with elements {0, 1, 2,3, 4,5} contains the four subgroups 


{O}, {0,3}, {0,2,4}, {0,1,2,3, 4, 5}. 


The middle two are non-trivial, proper subgroups. Note that every group, and so 
subgroup, contains the identity element. Infinite groups can also contain subgroups; 
indeed 

CIRDXQOOZ. 
Exercise 0.11.1. Prove that if G a subgroup of Z under addition, then either G = {0} or 


G=mZ := {mn:n € Z} for some integer m > 1. 


0.12. Matrices usually don’t commute 


Let M2(C) be the set of 2-by-2 matrices with entries in C, and then define 
Comm(M) := {A € M2(C): AM = MA} 


for each M € M2(C). It is evident that if M is a multiple of the identity matrix J, 
then M commutes with all of M2(C). Otherwise Comm(M) forms a 2-dimensional 
subspace of (the 4-dimensional) M2(C), as we now prove: 


Proposition 0.12.1. [f M is not a multiple of the identity matrix, then 
Comm(M) = {rl+sM: r,s €C}. 


0.12. Matrices usually don’t commute 21 


Exercise 0.12.1. Let M be an n-by-n matrix. 
(a) Prove that if A and B commute with M, then so does rA + sB for any complex numbers r 
and s. (We call rA+ sB a linear combination of A and B.) 
(b) Prove that M* commutes with M, for all k. 
(c) Deduce that all linear combinations of J, M,..., M"~+ belong to Comm(M). 


a b 
da)" 
(a) Prove that M is not a multiple of J if and only if at least one of a 4 d, b #0, c #0 holds. 
(b) Prove that if a 4 d, then for any matrix A there exists r,s € C such that A—rJI — 5M has 
zeros down the diagonal. 
(c) Prove that if b #0, then for any matrix A there exists r,s € C such that A—rI — 5M has 
zeros throughout the top row. 
(d) Prove that if c # 0, then for any matrix A there exists r,s € C such that A—rI — 5M has 
zeros throughout the first column 


Exercise 0.12.2. Let M = 


Proof of Proposition [0.12.1} It is evident that J and M commute with M, and 
hence M commutes with any linear combination of J and M by exercise|0.12.1] We 
now show that these are the only matrices that commute with M. 


Let M = : y . If Ae Comm(M), then B = A—rI—sM € Comm(M) for 


any r and s € C, by exercise [0.12.1] 


If a 4 d, then we select r and s as in exercise [0.12.2(a) so that B = t @ 


for some x,y € C. As B € Comm(M), we have 


cx dx\ (0 «\fa b\ | arn _ fa b\ (0 x\ _ (by az 
és eG ) (! q) = Ba = B= (¢ ’ ( = (a =) 


Comparing the off-diagonal terms on the left- and right-hand ends of the equation 
forces x = y = 0 (asa #4), so that B =0, and therefore A=rI+s5M. 


Ifa=d,b#0, and M 4al, then B may be written in the form B = & ) 
for some x,y € C, by exercise [0.12.2(b), so that 


0 Ye Veet (eet 
ax+cy bax +dy z y/\c d 


“(ea s)=(e 4): 


Comparing the terms in the top row on the left- and right-hand ends of the equation 
forces x = y = 0 (as b £0), so that B = 0, and therefore A=rI+s5M. 

Ifa = d,b=0,c 4 0, and M F al, then we may proceed analogously to 
the previous paragraph. Alternatively we may note that the result for M7 (the 
transpose of 4) is given by the previous paragraph and then follows for M since 
BM = MB if and only if B’ M7 = M7 BT. 

Finally if a= d and b=c= 0, then M is a multiple of J. 


ao 2 


Appendix OE. Rings and fields 


In section of appendix OD we introduced the notion of a group and gave 
various examples. The real numbers are a set of objects that remarkably support 
two different groups: There is both the additive group and the multiplicative group 
acting on the non-zero real numbers, and this partly explains why they play such a 
fundamental role in mathematics. In this appendix, we formalize these notions and 
the key differences between the structure of the real numbers and of the integers. 
This will allow us to better identify the properties of many important types of 
numbers that arise in number theory. 


0.13. Mixing addition and multiplication together: Rings and fields 


A set of numbers A which, like the reals, has an additive group on A with identity 
element 0 and a commutative multiplicative group on A \ {0} and for which the 
two groups interact according the distributive properties, 


ax (b+c) =(ax b)+(axc) and (a+b) xc=(axc)+(bx 0), 


is called a field. The reals R are an example, as are C and Q. A field provides the 
most convenient situation in which to do arithmetic. 


Exercise 0.13.1. Prove that a x 0 = 0 for all a € A, when A is a field. 


However the integers, Z, which are also vital to arithmetic, do not form a field: 
Although they form a group under addition, they do not form a group under mul- 
tiplication, since not every integer has a multiplicative inverse within the integers 
(for example, the multiplicative inverse of 2 is 1/2 which is not an integer). But you 
can multiply integers together, and the integers possess a multiplicative identity, 
1, so they have some of the properties of a multiplicative group, but not all. The 
integers are an example of a ring, which is a set of objects that form an additive 


22 


0.14. Algebraic numbers, integers, and units, I 23 


group, are closed under multiplication, and have a multiplicative identity, 1, as well 
as satisfying the above distributive properties! Thus Z is a commutative ring. 


The set of even integers, 2Z, narrowly fails being a ring; it simply lacks the 
multiplicative identity. The polynomials with integer coefficients, Z[z], form a ring. 
Indeed if A is a commutative ring, then Al[:] is also a commutative ring. 


For a given ring or field A and object a that is not in A, we are often interested 
in what type of mathematical object is created by adjoining a to A. This may be 
done in more than one way: 


e Ala}, which denotes polynomials in a with coefficients in A, that is, expressions 
of the form aj + a;a+---+aga% for any d > 0 where each a; € A. 


e A(a), which denotes rational functions in a with coefficients in A, or, more 
simply, quotients u/v with u,v € Ala] and v 4 0. 


For example we will prove in section B.4]that V2 is irrational (that is, 2 ¢ Q, and 
so V2 ¢ Z), so we would like to understand the sets Q(/2) and Z[V2]. 


Exercise 0.13.2. (a) Prove that Z[/2] = {a+bV2: a,b € Z} and that Z[V2] is a ring. 
(b) Prove that Q(V2) = Q[V2] = {a+ bV2: a,b € Q} and that Q(V2) is a field. 


0.14. Algebraic numbers, integers, and units, | 


If f(a) = sae fj) where fa #0, then f(x) has degree d and leading coefficient 
fa. We say that f(a) is monic if fg = 1. If all of the coefficients, f;, of f(x) are 
integers, then we write f(x) € Z[a]. The content of f(x) € Z[2] is the largest integer 
that divides all of its coefficients. If m is the content of f(x), then f(a) = mg(a) 
for some g(a) € Z[{a] of content 1. Obviously m divides every value f(n) with 
n € Z but there could be other integers that also have this property. For example, 
f(z) = a2? +a” +2 has content 1, but 2 divides f(n) for every integer n. 


We call a € C an algebraic number if it is a root of a polynomial f(x) € Z[a], 
with integer coefficients. If f is monic, then a is an algebraic integer. We call 
f(a) the minimal polynomial for a if f is the polynomial with integer coefficients, 
of smallest degree, with positive leading coefficient and of content 1, for which 
f(a) =0. Minimal polynomials are irreducible in Z[z], for if f(x) = g(x)h(x) with 
g(x), h(a) € Zia], then g(a)h(a) = f(a) = 0 so that either g(a) = 0 or h(a) = 0; 
and therefore a is a root of a polynomial of lower degree than f, contradicting 
minimality. 


Exercise 0.14.1. Let f(x) be the minimal polynomial of an algebraic number a. 

(a) Prove that if g(x) is a polynomial with integer coefficients for which g(a) = 0, then f(x) 
divides g(x). (You may use Proposition 2.10.1] of appendix 2B in your proof.) 

(b) Prove that if f(x) divides g(x) € Z{x] and g is monic, then f is monic. Deduce that if 
g(a) = 0, then a is an algebraic integer. 

(c) Prove that if g(a) = 0 and g is irreducible, then g = Kf for some constant « 4 0. 

(d) Prove that f(x) is the only minimal polynomial of a. 

(e) Prove that (a — a)? does not divide f(z). 


“For the sake of comparison, a ring does not necessarily have two of the properties of a field: The 
numbers in a ring do not necessarily have a multiplicative inverse, and they do not necessarily commute 
when multiplying them together. 


24 Appendix OE. Rings and fields 


Exercise 0.14.2. Prove that if a is an algebraic number and a root of f(x) € Z[x] where f has 
leading coefficient a, then aa is an algebraic integer. 


Exercise 0.14.3. What are the algebraic integers in Q? 


Exercise 0.14.4. (a) Prove that Z[Vd] is a subset of the algebraic integers. 
(b) Prove that Z[/2] is the set of algebraic integers in Q(V/2). 


(c) Prove that ae is an algebraic integer. 


If a is an algebraic integer, then so is ma+n for any integers m,n, for if f(x) is 
the minimal polynomial of a and has degree d, then F(x) := m4 f(4=*) is a monic 
polynomial in Z[z] with root ma +n. 


If @ is a non-zero algebraic number, with minimal polynomial f(x) of degree 
d, then 1/a is a root of 24 f(1/z). 


Exercise 0.14.5. (a) Prove that 1/a has minimal polynomial x@f(1/z). 
(b) Prove that a and 1/a are both algebraic integers if and only if f is monic and f(0) = 1 or 
—1. In this case a and 1/a are called units. 


Another way to view this is that a is a unit if and only if a divides 1, for if 8 = 1/a, then 
a@ = 1 and a and £ are both algebraic integers. 


Exercise 0.14.6. Suppose that a and @ are algebraic integers such that a divides 8, and 8 
divides a. Prove that there exists a unit u for which 8 = ua 


In the section [0.17] of appendix OF we will prove that if w and 8 are algebraic 
numbers, then so are a+ @ and a@. Moreover if a and £ are algebraic integers, 
then so area+ 6 and af. 


Exercise 0.14.7. (a) Prove that if a is an algebraic number, then Q(q) is a field. 
(b) Prove that if a is an algebraic integer, then Z[q] is a ring. 


Do there exist numbers a that are irrational (that is, that are not rational), so 
that, for instance, Q(a) is not the same thing as Q? To determine what numbers 
are irrational we should first classify, in a useful way, the rational numbers. The 
minimal polynomial of a rational number p/q with (p,q) = 1 is qa —p € Z[z'. 
Therefore one way to show that an algebraic number is irrational is to show that 
its minimal polynomial has degree > 1. Therefore given a polynomial, say x?—2, we 
have to decide whether it is the minimal polynomial for some number, or perhaps 
prove that it is irreducible (so that it cannot have a rational root). We will develop 
number-theoretic tools to do this. Another way to find irrational numbers is to 
perhaps show that there are numbers that are not the roots of any polynomial in 
Z|z]. Such numbers are not only irrational but are not even algebraic numbers 
and so are called transcendental. It is not too difficult to prove that transcendental 
numbers exist (by the “diagonalization argument” given in section[LL 16lof appendix 
11D), but it is rather more subtle to determine an actual transcendental number 
(though we will do so, using number-theoretic ideas in chapter 11). 


For much much more on university level algebra, much of which stems from 
number theory, the reader might care to look at the excellent textbook |DF04)| by 
Dummit and Foote or the more advanced but number theoretic [[R90]. 


Appendix OF. Symmetric 
polynomials 


It is difficult to work with algebraic numbers since one cannot necessarily eval- 


uate them precisely. For example the golden ratio, 1+v6 can easily be well- 


approximated, but how can you determine its precise value (since it is irrational)? 


We can often avoid working with the actual algebraic numbers themselves, but 
rather work with the set of all of the roots of the minimal polynomial. For example, 
the formula for the nth Fibonacci number involves both the golden ratio and 
the other root of its minimum polynomial x? — x — 1. It was Sir Isaac Newton who 
recognized that a function that is symmetric in all of the roots of a given polynomial 
is a rational number. 


0.15. The theory of symmetric polynomials 


We say that P(a1,22,...,%n) is a symmetric polynomial if 
P(a@p,02,-+-,Lk-1, 21, Uk41;+-+) En) = P(x1,02,...,2n) for each k. 
Here we swapped x, and x; and kept everything else the same. 


Exercise 0.15.1. Show that for any permutation o of 1,2,...,n and any symmetric polynomial 
P we have P(%5(1),%o(2);+++;®o(n)) = P(a1,#2,---,2n). 


Theorem 0.3 (The fundamental theorem of symmetric polynomials). For a given 
monic polynomial f(x) = + 4 ax’ with integer coefficients, each symmetric poly- 
nomial in the roots of f, with integer coefficients, can be expressed as a polynomial 
in the a; with rational coefficients. 


Proof. Let f(x) = Tt, (a —a;). We begin by proving the claim for the 


Shi= > ak for each k> 0. 


26 Appendix OF. Symmetric polynomials 


Multiplying out f(z) = Tt (e —a;) we have 


; ai = —-ai, ) aia; = a2, ; A;A;JAk = —A3, ---, A1AQ...An = LAy. 
a 


t<Jj i<j<k 


Then, since ria = 2, —., we have 


i=1 w-a;? 


dq. sy d d 
Djxo Jajx* ‘ = att a ko k 
d elo age = j= ae — oD (aix)" = Do se : 


_pd—i 
0 Aix i=l i=1 k>0 k>0 


This implies that 


d 
So(d- j)aa—jx) = Saat SS spv* = 
j=0 


k>0 N>0O 


Nea Oo 
ae 
IAL 

i 
Al 
Q 
Q 
J 
WwW 
ra 
g 
2 


Comparing the coefficients of x, we obtain (as ag = 1 


co iz a N)ag_n_ if N <d, 
SN = — Ad-iSN-i i 
dX 0 if N >d. 


Hence, by induction on N, we see that the sy are polynomials in the a;. 


We now sketch a proof of Newton’s result for arbitrary symmetric polynomi- 
als, by showing that every symmetric polynomial in the roots of f, with integer 
coefficients, can be written as a polynomial in the s; with integer coefficients. We 
proceed by induction on the number r of variables in the monomials of the sym- 
metric polynomial; that is, we select the monomials colt of? ut afr in f, with each 
k; > 1, for which r is maximal. In the r = 1 case, our poljnowil is simply a 
linear combination of the s;. Suppose that r > 1. If the k; are distinct in such a 
monomial, we subtract cs;, 8%, ..- 8k, and we are left with various cross terms but, 
in all of which, two or more of the variables a; are equal. If the k; are not all dis- 
tinct, then we subtract sx, 5x, ... 8x,/[],; mi! where m; := #{j : k; = 7}, to obtain 
various cross terms, with the same property. Hence, in the remaining expression, 
each monomial contains fewer variables and the result follows by induction. 


Exercise 0.15.2. If f is not monic, develop analogous results by working with g(x) defined by 
g(aax) = af * f (2). 


Example. Look at 5°; ; ; a;azaz. Subtract s1s283 and we have to account for the 
cases where 7 = j or 1 =k or j =k. Hence what remains is 


3,3 42 5 
- y Asap — y ai; a5 — y aia; + 286, 
i,k aj aj 


where in the first sum we have 7 = j, in the second i = k, in the third 7 = k, and in 
the last 1 = j = k (the coefficients being chosen by inclusion-exclusion). Proceeding 


0.16. Some special symmetric polynomials 27 


the same way again we have 
42 5 3,3 2 
) O05 = $482 — 86, ) QiQ} = $185 — 86, and ) avay, = (83 — s6)/2, 
ij ij i,k 


3 


a 


3,3 


the last since in s3 the cross term a az appears also as aya¥. Collecting this all 


together yields 

S- ayar5 oO = 818983 — 8185 — S984 — 83/2 + 9s6/2. 

i,j,k 
Throughout these calculations, the sum of the indices in each term is 6, the degree 
of the original polynomial. 


0.16. Some special symmetric polynomials 


If a and £ are the two roots of a monic quadratic polynomial with integer coeffi- 
cients, then x, = (a” — 8”)/(a— 8) is a symmetric function in a and @ and hence 
must be an integer by the fundamental theorem of symmetric polynomials. (We 
saw in exercise[0.1.4{c) that this is the nth term of the general second-order linear 
recurrence sequence that starts 0, 1.) 


If a is a root of an irreducible polynomial f(x) = al [24 (a —a;), then there 
are two symmetric polynomials of particular interest: 

The trace of a is ay + a2 +---+ aq, the sum of the roots of f. 

The norm of a is aja9...aq, the product of the roots of f. 
By the fundamental theorem of symmetric polynomials, the trace and the norm of 
an irreducible polynomial are rational numbers. 


Exercise 0.16.1. Show that if f(t) = Hex —aj;) € Zt], then ea f'(aj;) is an integer, by 
using the theory of symmetric polynomials. 


Using the product rule we see that 


k 
f(t) = a>— JT] a4) and so f'(a;) = a [J (aj - a4). 
j=1 a isisk 


We deduce that 
d 
d 
I[ fe) = o* J] Cla—a;)). 
j=l 1<i<j<d 
This is a symmetric polynomial in the roots a;, and so by Newton’s fundamental 
theorem of symmetric polynomials it must be a rational number. 


Let’s evaluate this product for the quadratic polynomial ax? + ba +c. If this 
has roots a and £, then az? + br + c = a(x — a)(x — B) and so 


a(a+8)=—b and aaB=c. 


Therefore 
a*(a— 8)? = (a(a+B))? —4a(aap) = b — 4ac, 
the discriminant of the polynomial, ax? + br + ¢. 


28 Appendix OF. Symmetric polynomials 


For the cubic polynomial x? + ax + b = (a — a)(x — B)(x — 7) we have 
a+B+7=0, aB+ay+By=a, and aby =—b. 
But then y = —(a + 8) so that a? + a8 + 8? =—a and a8(a+4 8) =b. Therefore 
(-1)°((a— B)(y- B)(a—7))? = —((a@— B)(a + 28)(2a+ B))? = 4a° + 2707, 


which is the discriminant of the polynomial x? + az + b. 


A beautifully symmetric function is given by the Vandermonde matrix. The 
3-by-3 version is 


1 1 i 
ct Y ZY], 
ay? 2? 


which has determinant (x —y)(y—z)(z—2). This is not quite a symmetric function 
since swapping any two variables multiplies the determinant by —1. (This is also 
apparent when swapping any two columns of the matrix.) One intuitive way to 
see that (x — y)(y — z)(z — x) is the determinant is by showing that each factor 
separately and together divides the determinant. To begin with, if = y, then the 
determinant equals 0 as the first two columns are now equal, which implies that 
x—y must be a factor of the determinant. Similarly «—z and z— y also divide the 
determinant. If x, y, and z are variables, then these expressions do not have any 
common factors, and so their product divides the determinant. This product has 
degree three (adding the degrees of the variables), as does the determinant, so they 
can differ by at most a constant factor. The constant factor can be determined by 
checking the coefficient of a particular monomial on both sides. For example x°y1 z? 
only arises in the determinant from multiplying out the terms of the main diagonal 
and therefore has coefficient 1, and one can equally look for how this monomial 
arises in the product [>] 

Exercise 0.16.2. Use the same argument to explain that the determinant of Vandermonde matrix 


j—1 ahs s 
V, where Vi,; =a} ", 1 <i,j <d, is Th<icj<aloi — ay). 


Exercise 0.16.3. Prove Theorem[0.2] when each ej; = 1 (assuming exercise [0.16.2). 


Now 


d d d 
(VV) = > V7 )ngVin = > ViaVin = > 0 of 0k? = site 
j=l j=l 


j=l 
for 1 <i,k <d. Hence (det V)? is the determinant of the matrix with (i, k)th entry 
Sitk—2- 


Exercise 0.16.4.' (This question requires some knowledge of linear algebra.) Suppose that M 
is an n-by-n matrix. 
(a) Prove that if M is a diagonal matrix in which all the diagonal entries are distinct, then 
Comm(M) equals the set of diagonal matrices. 
(b) Use exercise[0.16.2]to show that the set of diagonal matrices is then given by {agI +a1M+ 
+++ +an—1M"—-!: each a; € C}. 


8Diehard algebraists might be uncomfortable with this discussion since we ignore ideals that arise 
from the gcds of the polynomial factors, but these details can all be justified. 


0.17. Algebraic numbers, integers, and units, IT 29 


(c) Now let M,N, and T be n-by-n matrices with T invertible. Prove that M and N commute 
if and only if T~!MT and T~!NT commute. 

(d) Prove that if M is an n-by-n matrix with n distinct eigenvalues, then Comm(M) = {aol + 
a,M +---+an-1M""!: each a; € C}. 


0.17. Algebraic numbers, integers, and units, II 


We are now in a position to prove some of the claims of section of appendix 
OE. Suppose that a and £ are algebraic integers with minimal polynomials f and 


g. Then 
I] @-+t+»))= J] oe-, 
u: f(u)=0 u: f(u)=0 
v: g(v)=0 


by exercise [0.14.1(d). This is a symmetric polynomial in the roots u of f and so, 
by the fundamental theorem of symmetric polynomials, this is a monic polynomial 
with integer coefficients having root a+ 6, and so a+ f is an algebraic integer. 
Exercise 0.17.1. (a)? Prove that if a 4 0 and # are algebraic integers, then af is also an 
algebraic integer. 
(b) Prove that if a 4 0 and £ are algebraic numbers, then a+ 6 and af are algebraic numbers. 


Exercise 0.17.2. Prove that if a1,...,a% are algebraic numbers, then Q(a1,...,ax%) is a field. 
These are the number fields. 


Let Q denote the set of all algebraic numbers. Evidently if K is any number 
field, then K C Q. It is not difficult to prove that Q is itself a field. Similarly if 
A is the set of all algebraic integers, then A is a ring and the algebraic integers 
inside a given number field K form a subring, which is precisely K M A. However 
identifying the elements of kK M A explicitly can be rather more challenging, as we 
saw in exercise |Q0.14.4 


Rather more interestingly, the roots of any polynomial with coefficients in Q 
all belong to Q. 


Proposition 0.17.1. Suppose that f(x) € Q|z] and that f(p) =0. Then p€ Q. 
We say that Q is algebraically closed. 


Proof. Suppose that f(z) = a9 + a,@ +--+ agx? so that each a; is an algebraic 
number. Suppose that a; has minimal polynomial g;(z); and let A; be the set of 
roots of g;(z). Then f(x) divides the polynomial 


F(x) := II i is I] (eo + eur +--+ + a2") 
ag€ Ag a EC Ay age Aa 
which is a symmetric polynomial in the elements of each A; with 0 < j < d and 


therefore belongs to Q|a] by the law of symmetric polynomials. Any root of f(x) 
is a root of F(x) and therefore must be an algebraic number. 


For further development of these ideas see chapter 8 of |Tig16). 


Appendix 0G. Constructibility 


0.18. Constructible using only compass and ruler 


The ancient Greeks were interested in what could be constructed using only a 
straight edge (sometimes called an “unmarked ruler”, or just plain “ruler”) and a 
compass. Three questions stumped them: 


Quadrature of the circle: 
Draw a square that has area equal to that of a given circle. 


To draw a square whose area is 7 (the same area as a circle of radius 1), we need 
to be able to draw a square with sides of length x, where 
zx is a root of the equation x? — 7. 


Duplication of the cube: 
Construct a cube that has twice the volume of a given cube. 


If the original cube has side length 1 (and so volume 1), we would need to be able 
to construct a cube with sides of length x, where 
x is a root of the equation x? — 2. 


Trisection of the angle: 
Construct an angle which is one third the size of a given angle. 


Constructing an angle @ is as difficult as constructing a right-angled triangle con- 
taining that angle, that is, the triangle with side lengths sin, cos@, 1. Therefore 
if we start with angle 36 and wish to determine the angle 0, then we will need to be 
able to determine cos @ from cos 34 and sin 36. But these are linked by the formula 
cos 36 = 4cos* 6—3cos 6; that is, we need to find the root x = 2.cos 6 of x? —32—A 
where A = 2cos36. For example, if 6 = 7/9, we will need to be able to construct 
a right-angled triangle with a side of length x/2 where 
zx is a root of the equation x* — 3a — 1. 


0.18. Constructible using only compass and ruler 31 


We need to understand the algebra of points that are constructed from given 
points and lengths by “ruler and compass”. Our tools are: 


e An unmarked ruler, which allows us to draw the line between any two given 
points and to extend that line as far as we like. 


e A compass, which allows us to draw the circle centered on one given point, 
of radius a given length, or the distance between two given points. 


Proposition 0.18.1. Given a set of points on lines and a set of lengths, any new 
points that can be constructed from these, using only ruler and compass, will have 
coordinates that can be determined as the roots of degree-one or degree-two polyno- 
mials, whose coefficients are rational functions of the already known coordinates. 


Proof. The lines are defined by pairs of points: Given the points A = (a1, a2) and 
B = (by, b2) the line between them is (b — a1)(y — az) = (b2 — a2)(x — a4). 


Exercise 0.18.1. Show that the coefficients of the equation of this line can be determined by a 
degree-one equation in already known coordinates. 


Exercise 0.18.2. Prove that any two (non-parallel) lines intersect in a point that can be deter- 
mined by a degree-one equation in the coefficients of the equations of the lines. 

Given a length r and a point C = (ci,c2), we can draw the circle (« — c,)? + 
(y — co)? =r? centered at C of radius r. 


Exercise 0.18.3. Prove that the points of intersection of this circle with a given line can be given 
by a degree-two equation in already known coordinates. 


Exercise 0.18.4. Prove that the points of intersection of two circles can be given by a degree-two 
equation in already known coordinates. 


Combining all these exercises implies Proposition |0.18.1 


We sketch here how one uses Proposition [0.18.1] to show that the Greeks were 
stumped by their three questions for good reason—none of the three were possible. 
Proposition [0.18.1] implies that we can draw a square that has area equal to that 
of a given circle if and only if 7 can be obtained in terms of a (finite) succession of 
roots of linear or quadratic polynomials whose coefficients are already constructed. 
If this can be done, then 7 would be the root of some polynomial (perhaps of 
high degree); in other words 7 would be an algebraic number. However Lindemann 
proved, in 1882, that 7 is transcendental (as we will discuss in more detail in section 


of appendix 11D). 


If a is obtained from a (finite) succession of roots of linear or quadratic poly- 
nomials whose coefficients are already constructed, then a is not only an algebraic 
number but one can show that its minimal polynomial has degree which is a power 
of 2. Both x?—2 and x? —3x—1 are irreducible (which can be shown using Theorem 
see exercise [3.4.4], and so these are the minimum polynomials for their roots 
(by exercise [0.14-1{c)). Therefore one cannot duplicate the cube, nor trisect the 
angle 7/3, since the roots of these irreducible polynomials of degree three do not 
have minimum polynomials that have degrees that are a power of 2. 


For further development of these ideas see section 13.3 of |DF04) or section 
9.11 of [LR90}. 


Chapter 1 


The Euclidean algorithm 


1.1. Finding the gcd 


Most readers will know the Euclidean algorithm, used to find the greatest common 
divisor (gcd) of two given integers. For example, to determine the greatest common 
divisor of 85 and 48, we begin by subtracting the smaller from the larger, 48 from 
85, to obtain 85 — 48 = 37. Now gcd(85, 48) = gcd(48, 37), because the common 
divisors of 48 and 37 are precisely the same as those of 85 and 48, and so we apply 
the algorithm again to the pair 48 and 37. So we subtract the smaller from the 
larger to obtain 48 — 37 = 11, so that gcd(48, 37) = gcd(37,11). Next we should 
subtract 11 from 37, but then we would only do so again, and a third time, so let’s 
do all that in one go and take 37 — 3 x 11 = 4, to obtain gced(87, 11) = ged(11, 4). 
Similarly we take 11 — 2 x 4 = 3, and then 4— 3 = 1, so that the gcd of 85 and 48 
is 1. This is the Euclidean algorithm that you might already have seen[] but did 
you ever prove that it really works? 


To do so, we will first carefully define terms that we have implicitly used in the 
above paragraph, perhaps mathematical terms that you have used for years (such 
as “divides”, “quotient”, and “remainder” ) without a formal definition. This may 
seem pedantic but the goal is to make sure that the rules of basic arithmetic are 
really established on a sound footing. 


Let a and b be given integers. We say that a is divisible by b, or that b divides al? 
if there exists an integer q such that a = qb. For convenience we write “0 | a” BA 
We now set an exercise for the reader to check that the definition allows one to 
manipulate the notion of division in several familiar ways. 


Exercise 1.1.1. In this question, and throughout, we assume that a, b, and c are integers. 
(a) Prove that if b divides a, then either a = 0 or |a| > |b]. 


1There will be a formal discussion of the Euclidean algorithm in appendix 1A. 

?One can also say a is a multiple of b or b is a divisor of a or b is a factor of a. 

3 And if b does not divide a, we write “b{ a”. 

4One reason for giving a precise mathematical definition for division is that it allows us to better 
decide how to interpret questions like, “What is 1 divided by 0?” or “What is 0 divided by 0?” 


33 


34 1. The Euclidean algorithm 


(b) Deduce that if alb and bla, then b = a or b = —a (which, in future, we will write as 
"b — ta”). 
) Prove that if a divides b and c, then a divides bx 4+ cy for all integers 2, y. 
) Prove that a divides b if and only if a divides —b if and only if —a divides b. 
(e) Prove that if a divides b, and b divides c, then a divides c. 
(f) Prove that if a ~ 0 and ac divides ab, then c divides b. 


Next we formalize the notion of “dividing with remainder”. 


Lemma 1.1.1. Jf a and b are integers, with b > 1, then there exist unique integers 
qandr, withO<r<b—-1, such thata=qb+r. We call gq the “quotient”, and r 
the “remainder”. 


Proof by induction. We begin by proving the existence of q and r. For each 
b > 1, we proceed by induction on a > 0. If 0 < a < b—1, then the result follows 
with g = 0 and r =a. Otherwise assume that the result holds for 0,1,2,...,a@—1, 
where a > b. Then a—1>a-—6> 050, by the induction hypothesis, there exist 
integers Q and r, with 0 < r < b—1, for which a—b = Qb+r. Therefore a = qb+r 
withq=Q+1. 

If a < 0, then —a > 0 so we have —a = Qb+ R, for some integers @ and R, 
with 0 < R < b—1, by the previous paragraph. If R = 0, then a = qb where 
q = —Q (and r = 0). Otherwise 1 < R<b—1andsoa=qb+r with g=—-Q-1 
and1l<r=b—R< b—1, as required. 

Now we show that q and r are unique. If a= qb+ r = Qb+ R, then b divides 
(q—Q)b= R—-r. However 0<7r,R<b—1s0 that |R-—r|<b—-1,andb| R—-r. 
Therefore R — r = 0 by exercise [LL ia), and so Q — q = 0. In other words q = Q 
and r = R; that is, the pair q,r is unique. 


An easier, but less intuitive, proof. We can add a multiple of b to a to get a 
positive integer. That is, there exists an integer n such that a+ nb > 0; any integer 
n > —a/b will do. We now subtract multiples of b from this number, as long as it 
remains positive, until subtracting b once more would make it negative. In other 
words we now have an integer a—qb > 0, which we denote by r, such that r—b < 0; 
in other words 0< r < 6-1. 


Exercise 1.1.2. Suppose that a > 1 and 6 > 2 are integers. Show that we can write a in base b; 
that is, show that there exist integers ao,a1,... € [0,b—1] for which a = agb¢+ag_1b4—!+a1b+a0. 


We say that d is a common divisor of integers a and b if d divides both a and 
b. We are mostly interested in the greatest common divisor of a and b, which we 
denote by gcd(a, b), or more simply as (a, b) BG 

We say that a is coprime with }, or that a and b are coprime integers, or that 
a and 0 are relatively prime, if (a,b) = 1. 


°In the UK this is known as the highest common factor of a and b and is written hcf(a, 6). 

®When a = b = 0, every integer is a divisor of 0, so there is no greatest divisor, and therefore 
gcd(0,0) is undefined. There are often one or two cases in which a generally useful mathematical 
definition does not give a unique value. Another example is 0 divided by 0, which we explore in exercise 
For aesthetic reasons, some authors choose to assign a value which is consistent with the theory 
in one situation but perhaps not in another. This can lead to artificial inconsistencies which is why we 
choose to leave such function-values undefined. 


1.2. Linear combinations 35 


Corollary 1.1.1. [fa=qb+r where a, b, q, andr are integers, then 
gcd(a, b) = gced(b,r). 


Proof. Let g = gcd(a,b) and h = gcd(r,b). Now g divides both a and b, so g 
divides a — gb = r (by exercise[L.T.J[c)). Therefore g is a common divisor of both r 
and 6, and therefore g < h. Similarly h divides both b and r, so h divides qgb+-r =a 
and hence h is a common divisor of both a and b, and therefore h < g. We have 
shown that g < h and h < g, which together imply that g = h. 


Corollary justifies the method used to determine the gcd of 85 and 48 in 
the first. paragraph of section [LJ] and indeed in general: 
Exercise 1.1.3. Use Corollary [Z1-J] to prove that the Euclidean algorithm indeed yields the 


greatest common divisor of two given integers. (You might prove this by induction on the smallest 
of the two integers.) 


Exercise 1.1.4. Prove that (Fn, fn+1) = 1 by induction on n > 0. 


1.2. Linear combinations 


The Euclidean algorithm can also be used to determine a linear combination] of 
a and b, over the integers, which equals gcd(a, b); that is, one can always use the 
Euclidean algorithm to find integers u and v such that 

(1.2.1) au + bv = gcd(a, b). 

Let us see how to do this in an example, by finding integers u and v such that 
85u + 48u = 1; remember that we found the gcd of 85 and 48 at the beginning of 
section [L.1] We retrace the steps of the Euclidean algorithm, but in reverse: The 
final step was that 1 = 1-4—1-8, a linear combination of 4 and 3. The second to 
last step used that 3 = 11—2-4, and so substituting 11—2-4 for 3in 1 =1-4—-1-3, 
we obtain 

1=1-4-1-3=1-4-1-(11-—2-4)=3-4-1-11, 
a linear combination of 11 and 4. This then implies, since we had 4 = 37 — 3-11, 
that 
1=3-(37-—3-11)—-1-11=3-37-10-11, 
a linear combination of 37 and 11. Continuing in this way, we successively deduce, 
using that 11 = 48 — 37 and then that 37 = 85 — 48, 


1= 3-37-—10- (48 — 37) =13-37-— 10-48 
= 13- (85 — 48) — 10-48 = 13-85 — 23 - 48; 
that is, we have the desired linear combination of 85 and 48. 


To prove that this method always works, we will use Lemma [LILI] again: Sup- 
pose that a = qb+r so that gced(a,b) = ged(b,r) by Corollary [LI.J] and that we 
have bu — ru = gcd(b,r) for some integers u and v. Then 


(1.2.2) gcd(a, b) = gcd(b, r) = bu — rv = bu — (a — qb)v = b(u + qv) — av, 


” A linear combination of two given integers a and b, over the integers, is a number of the form ax+by 
where x and y are integers. This can be generalized to yield a linear combination a,%1 +---+a4n2n 
of any finite set of integers, aji,..., a,. Linear combinations are a key concept in linear algebra and 
appear (without necessarily being called that) in many courses. 


36 1. The Euclidean algorithm 


the desired linear combination of a and b. This argument forms the basis of our 
proof of (1.2.1), but to give a complete proof we proceed by induction on the smaller 
of a and b: 


Theorem 1.1. Ifa and b are positive integers, then there exist integers u and v 
such that 
au + by = gcd(a, b). 


Proof. Interchanging a and b if necessary we may assume that a > b > 1. We shall 
prove the result by induction on b. If b= 1, then 6 only has the divisor 1, so that 


gcd(a, 1) =1=0-a+1-1. 
We now prove the result for b > 1: If b divides a, then 
gcd(b,a) =b=0-a+4+1-b. 


Otherwise b does not divide a and so Lemma implies that there exist integers 
q and r such that a= qgb+rand1<r< 6-1. Since 1 <r < 6 we know, by the 
induction hypothesis, that there exist integers u and v for which bu—rv = gcd(b,r). 
The result then follows by (L2.2). 


We now establish various useful properties of the gcd: 


Exercise 1.2.1. (a) Prove that if d divides both a and 6, then d divides gcd(a, b). 
(b) Deduce that d divides both a and b if and only if d divides gcd(a, b). 
(c) Prove that 1 < gcd(a, b) < |a| and |6]. 
(d) Prove that gcd(a, b) = |a| if and only if a divides b. 
Exercise 1.2.2. Suppose that a divides m, and b divides n. 
(a) Deduce that gcd(a, b) divides gcd(m,n). 
(b) Deduce that if ged(m,n) = 1, then gced(a, b) = 1. 
Exercise 1.2.3. Show that Theorem[L.J] holds for any integers a and b that are not both 0. (It 
is currently stated and proved only for positive integers a and b.) 


Corollary 1.2.1. [fa and b are integers for which gcd(a,b) = 1, then there exist 
integers u and v such that 
au + bv = 1. 


This is one of the most useful results in mathematics and has applications in 
many areas, including in safeguarding today’s global communications. For example, 
we will see in section [10.3] that to implement RSA, a key cryptographic protocol 
that helps keep important messages safe in our electronic world, one uses Corollary 
[1.2.1]in an essential way. More on that later, after developing more basic number 
theory. 


Exercise 1.2.4. (a) Use exercise 1.1.1(c) to show that if au + bv = 1, then (a,b) = (u,v) = 1. 
(b) Prove that gcd(u,v) = 1 in Theorem 


Corollary 1.2.2. If gcd(a,m) = gcd(b,m) = 1, then gcd(ab, m) = 1. 


Proof. By Theorem [LJ] there exist integers r,s, u,v such that 


aut mv = br+ms=1. 


1.3. The set of linear combinations of two integers 37 


Therefore 


ab(ur) + m(bur + aus + msv) = (au+mv)(br + ms) = 1, 


and the result follows from exercise [I.2.4{a). 


Corollary 1.2.3. We have gcd(ma, mb) = m- gcd(a,b) for all integers m > 1. 


Proof. By Theorem{[L.i]there exist integers u,v such that au+bv = gcd(a,b). Now 
gcd(ma, mb) divides ma and mb so it divides mau + mbuv = m- gcd(a, b). Similarly 
gcd(a, b) divides a and b, so that m- gcd(a,b) divides ma and mb, and therefore 
gcd(ma, mb) by exercise [L.2.-1(a). The result follows from exercise [LT.i{b), since 
the gcd is always positive. 


Exercise 1.2.5. (a) Show that if A and B are given integers, not both 0, with g = gcd(A, B), 
then gcd(A/g, B/g) = 1. 
(b) Prove that any rational number w/v where u,v € Z with v 4 0 may be written as r/s where 
r and s are coprime integers with s > 0. This is called a reduced fraction. 


1.3. The set of linear combinations of two integers 


Theorem states that the greatest common divisor of two integers is a linear 
combination of those two integers. This suggests that it might be useful to study 
the set of linear combinations 


I(a,b) := {am + bn: m,n e€ Z} 


of two given integers a and ol We see that I(a,b) contains 0, a, b, a+b, a+ 
2b, 2b+ a, a—b, b—a,... and any sum of integer multiples of a and b, so that 
I(a,b) is closed under addition. Let I(a) := I(a,0) = {am: m € Z}, the set of 
integer multiples of a. We now prove that I(a,b) can be described as the set of 
integer multiples of ged(a, b), a set which is easier to understand: 


Corollary 1.3.1. For any given non-zero integers a and b, we have 
{am+bn: m,ne€ Z}={gk: ke Z} 


where g := gcd(a, b); that is, I(a,b) = I(g). In other words, there exist integers m 
and n with am + bn = c if and only if gcd(a, b) divides c. 


Proof. By Theorem [I.1] we know that there exist u,v € Z for which au + bu = g. 
Therefore a(uk)+6(vk) = gk so that gk € I(a, b) for allk € Z; that is, I(g) C I(a,). 
On the other hand, as g divides both a and 6, there exist integers A, B such that 
a = gA, b= gB, and so any am+bn = g(Am+Bn) € I(g). That is, I(a,b) C I(g). 
The result now follows from the two inclusions. 


It is instructive to see how this result follows directly from the Euclidean algo- 
rithm: In our example, we are interested in gcd(85, 48), so we will study I(85, 48), 
that is, the set of integers of the form 


85m + 48n. 


’ This is usually called the ideal generated by a and 6 in Z and denoted by (a, b)z. The notion of 
an ideal is one of the basic tools of modern algebra, as we will discuss in appendix 3D. 


38 1. The Euclidean algorithm 


The first step in the Euclidean algorithm was to write 85 = 1-48+37. Substituting 
this in above yields 


85m + 48n = (1-48 + 37)m + 48n = 48(m + n) + 37m, 
and so (85,48) C I(48, 37). In the other direction, any integer in [(48,37) can be 
written as 
48a + 37b = 48a + (85 — 48)b = 85b + 48(a — b), 
and so belongs to (85,48). Combining these last two statements yields that 
1(85,48) = I(48, 37). 

Each step of the Euclidean algorithm leads to a similar equality, and so we get 

1(85, 48) = 1(48, 37) = 1(37,11) = 1(11, 4) = 1(4,3) = 7(3, 1) = 7(1,0) = 7(1). 
To truly justify this we need to establish an analogous result to Corollary [L.1.1} 
Lemma 1.3.1. Ifa = qb+r where a, b, q, andr are integers, then I(a,b) = I(b,r). 


Proof. We begin by noting that 
am + bn = (qgb+r)m+ bn = (gm +n) +rm 
so that I(a,b) C I(b,r). In the other direction 
bu + rv = bu + (a — qb)v = av + b(u — qu) 


so that I(b,r) C I(a,b). The result follows by combining the two inclusions. 


We have used the Euclidean algorithm to find the gcd of any two given integers 
a and 0, as well as to determine integers u and v for which au + bv = gcd(a,b). 
The price for obtaining the actual values of u and v, rather than merely proving 
the existence of u and v (which is all that was claimed in Theorem [L.1), was our 
somewhat complicated analysis of the Euclidean algorithm. However, if we only 
wish to prove that such integers u and v exist, then we can do so with a somewhat 
easier proof: 


Non-constructive proof of Theorem [1.1] Let h be the smallest positive inte- 
ger that belongs to I(a,b), say h = au+ bv. Then g := gcd(a,b) divides h, as g 
divides both a and 6. 


Now a =a-1+0-0s0 that a € I(a,b), and 1 < h < a by the definition of h. 
Therefore Lemma[i.1.l]implies that there exist integers q andr, withO <r <h-1, 
for which a = qh +r. Therefore 


r=a-—qh=a-—q(au+ bv) = a(1 — qu) + b(—qv) € I(a,b), 


which contradicts the minimality of h, unless r = 0; that is, h divides a. An 
analogous argument reveals that h divides b, and so h divides g by exercise[I.2.1[a). 


°We will now prove the existence of u and v by showing that their non-existence would lead to a 
contradiction. We will develop other instances, as we proceed, of both constructive and non-constructive 
proofs of important theorems. 

Which type of proof is preferable? This is somewhat a matter of taste. The non-constructive proof 
is often shorter and more elegant. The constructive proof, on the other hand, is practical—that is, it 
gives solutions. It is also “richer” in that it develops more than is (immediately) needed, though some 
might say that these extras are irrelevant. 

Which type of proof has the greatest clarity? That depends on the algorithm devised for the con- 
structive proof. A compact algorithm will often cast light on the subject. But a cumbersome one may 
obscure it. In this case, the Euclidean algorithm is remarkably simple and efficient ({Sha8s5] p. 11)). 


1.5. Continued fractions 39 


Hence g divides h, and h divides g, and g and h are both positive, so that g = h 
as desired. 


We say that the integers a, b, and c are relatively prime if gcd(a, b,c) = 1. We 
say that they are pairwise coprime if gcd(a,b) = gcd(a,c) = gcd(b,c) = 1. For 
example, 6,10,15 are relatively prime, but they are not pairwise coprime (since 
each pair of integers has a common factor > 1). 

Exercise 1.3.1. Suppose that a, b, and c are non-zero integers for which a+ b= c. 
(a) Show that a,b,c are relatively prime if and only if they are pairwise coprime. 
(b) Show that (a,b) = (a,c) = (b,c). 
(c) Show that the analogy to (a) is false for integer solutions a, b,c,d to a+b =c+d (perhaps 
by constructing a counterexample). 


1.4. The least common multiple 


The least common multipl¢)| of two given integers a and 6 is defined to be the 
smallest positive integer that is a multiple of both a and b. We denote this by 
Icmfa, b] (or simply [a, b]). We now prove the counterpart to exercise [1.2. Ifa): 


Lemma 1.4.1. lcm|{a, b] divides integer m if and only if a and b both divide m. 


Proof. Since a and 6 divide lcm[a, }], if lem[a, 6] divides m, then a and b both 
divide m, by exercise e). 


On the other hand suppose a and b both divide m, and write m = qlcm|a, b] +r 
where 0 < r < Icmla,b]. Now a and b both divide m and Icm[a, b] so they both 
divide m — qlcm[a, 6] = r. However lcm[a, b] is defined to be the smallest positive 
integer that is divisible by both a and b, which implies that r must be 0. Therefore 
lcm[a, b] divides m. 


The analogies to exercise[I.2.1(d) and Corollary [1.2.3] for lems are given by the 
following two exercises: 


Exercise 1.4.1. Prove that lem[m,n] =n if and only if m divides n. 


Exercise 1.4.2. Prove that lem[ma, mb] = m- lcm[a, b] for any positive integer m. 


1.5. Continued fractions 


Another way to write Lemma is that for any given integers a > b > 1 with 
b{a, there exist integers g and r, with b > r > 1, for which 
r 


1 


r 


a 

b 
This is admittedly a strange way to write things, but repeating this process with 
the pair of integers b and r, and then again, will eventually lead us to an interesting 


representation of the original fraction a/b. Working with our original example, in 
which we found the gcd of 85 and 48, we can represent 85 = 48 + 37 as 


ae 
48 3’ 


19Sometimes called the lowest common multiple. 


40 1. The Euclidean algorithm 


and the next step, 48 = 37+ 11, as 


48 1 85 1 1 
ay = 1+, sothat [~=1l+qm=lt+ ee 
37 a 48 a 1+ a 


The remaining steps of the Euclidean algorithm may be rewritten as 


37 1 11 1 4 1 
=3+7; =2+7, and 5=1+35, 
11 7 4 3 3 3 
so that 
ee 1 
48 1+5—=— 
TE 


This is the continued fraction for a and is conveniently written as [1, 1,3, 2, 1,3]. 


Notice that this is the sequence of quotients a; from the various divisions; that is, 


a 1 
a [a0, @1, G2, ..., Ax] = aot = i 
a1 a2 +——+ 
agt tar 


The a,’s are called the partial quotients of the continued fraction. 


Exercise 1.5.1. (a) Show that if a, > 1, then [ao, a1,..., ax] =[ao, a1,..., ax — 1,1]. 
(b) Prove that the set of positive rational numbers are in 1-1 correspondence with the finite 
length continued fractions that do not end in 1. 


We now list the rationals that correspond to the first few entries in our contin- 
ued fraction [1,1,3,2,1,3]. We have [1] = 1, [1,1] = 2, and 


ee ee ee ee 1,1 8 
142 4’ 1+ 9? aa le 
2+¢ 


These yield increasingly good approximations to 85/48 = 1.770833..., that is, in 
decimal notation, 
1, 2, 1.75, 1.777..., 1.7692.... 


We call these p;/q;, 7 => 1, the convergents for the continued fraction, defined by 


D; 
— [ao, Q1, 42, ---, a,|, 

qj 
since they converge to a/b = pz/q, for some k. Do you notice anything surprising 
about the convergents for 85/48? In particular the previous one, namely 23/13? 
When we worked through the Euclidean algorithm we found that 13-85— 23-48 = 1 
— could it be a coincidence that these same numbers show up again in this new 
context? In section [1.8] of appendix 1A we show that this is no coincidence; indeed 
we always have 

Ae : — j-1 
393-1 — Pj-1G = (-1)"™, 


k—-la,_y and v = (—1)*p,_1, then 


so, in general, if w = (—1) 
au + bv = 1. 


When one studies this in detail, one finds that the continued fraction is really 
just a convenient reworking of the Euclidean algorithm (as we explained it above) 


1.6. Tiling a rectangle with squares Al 


for finding u and v. Bachet de Meziriad“4] introduced this method to Renaissance 
mathematicians in the second edition of his brilliantly named book Pleasant and 
delectable problems which are made from numbers (1624). Such methods had been 
known from ancient times, certainly to the Indian scholar Aryabhata in 499 A.D., 
probably to Archimedes in Syracuse (Greece) in 250 B.C., and possibly to the 
Babylonians as far back as 1700 B.C 


1.6. Tiling a rectangle with squared] 


Given a 48-by-85 rectangle we will tile it, greedily, with squares. The largest square 
that we can place inside a 48-by-85 rectangle is a 48-by-48 square. This 48-by-48 
square goes from top to bottom of the rectangle, and if we place it at the far right, 
then we are left with a 37-by-48 rectangle to tile, on the left. 


85 


13 


Figure 1.1. Partitioning a rectangle into squares, using the Euclidean algorithm. 


If we place a 37-by-37 square at the top of this rectangle, then we are left with an 
11-by-37 rectangle in the bottom left-hand corner. We can now place three 11-by-11 
squares inside this, leaving a 4-by-11 rectangle. We finish this off with two 4-by-4 
squares, one 3-by-3 square, and finally three 1-by-1 squares. 


11The celebrated editor and commentator on Diophantus, whom we will meet again in chapter 6. 

12There are Cuneiform clay tablets from this era that contain related calculations. It is known 
that after conquering Babylon in 331 B.C., Alexander the Great ordered his archivist Callisthenes and 
his tutor Aristotle to supervise the translation of the Babylonian astronomical records into Greek. It is 
therefore feasible that Archimedes was introduced to these ideas from this source. Indeed, Pythagoras’s 
Theorem may be misnamed as the Babylonians knew of integer-sided right-angled triangles like 3, 4,5 
and 5,12,13 more than one thousand years before Pythagoras (570-495 B.C.) was born. 

13Thanks to Dusa MacDuff and Dylan Thurston for bringing my attention to this beautiful 
application. 


42 1. The Euclidean algorithm 


The area of the rectangle can be computed in terms of the areas of each of the 
squares; that is, 


85-48 = 1-487 +1-377 43-117 4+2-47 41-37 43-17. 


What has this to do with the Euclidean algorithm? Hopefully the reader has 
recognized the same sequence of numbers and quotients that appeared above, when 
we computed the gcd(85, 48). This is no coincidence. At a given step we have an 
a-by-b rectangle, with a > b > 1, and then we can remove q b-by-b squares, where 
a = qb+r with 0 < r < a—1 leaving an r-by-b rectangle, and so proceed by 
induction. 


Exercise 1.6.1. Given an a-by-b rectangle show how to write a-b as a sum of squares, as above, 
in terms of the partial quotients and convergents of the continued fraction for a/b. 


Exercise 1.6.2. (a) Use this to show that Fn4ifFn = F? } Fe, fees Fe, where Fy, is 
the nth Fibonacci number (see section 0.1 for the definition and a discussion of Fibonacci 
numbers and exercise[0.4.12{b) for a generalization of this exercise). 


(b)t Find the correct generalization to more general second-order linear recurrence sequences. 


Additional exercises 


Exercise 1.7.1. (a) Does 0 divide 0? (Use the definition of “divides” .) 
(b) Show that there is no unique meaning to 0/0. 
(c) Prove that if b divides a and b ¥ 0, then there is a unique meaning to a/b. 


Exercise 1.7.2. Prove that if a and b are not both 0, then gcd(a, b) is a positive integer. 


Exercise 1.7.3. Prove that if m and n are coprime positive integers, then Grecbn— Dt is an 


integer. 


Exercise 1.7.4. Suppose that a= qb+r withO<r<b—-1. 

(a) Let [t] be the integer part of t, that is, the largest integer < t. Prove that q = [a/b]. 

(b) Let {t} to be the fractional part of t, that is, {t} = t— [t]. Prove that r = b{r/b} = b{a/b}. 
(Beware of these functions applied to negative numbers: e.g., [—3.14] = —4 not —3, and {—3.14} = 
.86 not .14.) 


Exercise 1.7.5.1 (a) Show that if n is an integer, then {n + a} = {a} and [n +a] = n+ [a] 

for alla € R. 

(b) Prove that [a + 6] — [a] — [8] = 0 or 1 for all a, 8 € R, and explain when each case occurs. 

(c) Deduce that {a} + {8} — {a+ 8} = 0 or 1 for all a,8 € R, and explain when each case 

occurs. 

(d) Show that {a} + {—a} =1 unless a is an integer in which case it equals 0. 
(e) Show that ifa € Zand r € R\Z, then [r]+ [a-—r] =a—-1. 


Exercise 1.7.6. Suppose that d is a positive integer and that N,ax > 0. 
(a) Show that there are exactly [x] positive integers < a. 
(b) Show that kd is the largest multiple of d that is < N, where k = [N/d]. 
(c) Deduce that there are exactly [N/d] positive integers n < N which are divisible by d. 


Exercise 1.7.7. Prove that Sar [a+ x] = [na] for any real number a and integer n > 1. 


Exercise 1.7.8. Suppose that a+b =c and let g = gcd(a,b). Prove that we can write a = gA, 
b = gB, and c= gC where A+ B =C, where A, B, and C are pairwise coprime integers. 


Exercise 1.7.9. Prove that if (a,b) = 1, then (a+ b,a— 6) = 1 or 2. 


Exercise 1.7.10. Prove that for any given integers b > a > 1 there exists an integer solution 
u,w to au — bw = gcd(a,b) withO<u<b-—landO0O<w<a-l1. 


Questions on the Euclidean algorithm 43 


Exercise 1.7.11.' Show that if gcd(a,b) = 1, then gcd(a*, b®) = 1 for all integers k, > 1. 


Exercise 1.7.12. Let m and n be positive integers. What fractions do the two lists +, yp 


and i, er, noi have in common (when the fractions are reduced)? 


2 
mrt? 
are put in increasing order, what is the shortest distance between two consecutive 


Exercise 1.7.13. Suppose m and n are coprime positive integers. When the fractions + 
m-1 1 n-1 


m ? n? eat n 
fractions? 


Given a 7-liter jug and a 5-liter jug one can measure 1 liter of water as follows: 
Fill the 5-liter jug, and pour the contents into the 7-liter jug. Fill the 5-liter jug 
again, use this to fill the 7-liter jug, so we are left with 3 liters in the 5-liter jug 
and the 7-liter jug is full. Empty the 7-liter jug, pour the contents of the 5-liter jug 
into the 7-liter jug, and refill the 5-liter jug. We now have 3 liters in the 7-liter jug. 
Fill the 7-liter jug using the 5-liter jug; we have poured 4 liters from the 5-liter jug 
into the 7-liter jug, so that there is just 1 liter left in the 5-liter jug! Notice that 
we filled the 5-liter jug 3 times and emptied the 7-liter jug twice, and so we used 
here that 3 x 5—2 x 7=1. We have wasted 2 x 7 liters of water in this process. 


Exercise 1.7.14. (a) Since 3 x 7—4 x 5 = 1 describe how we can proceed by filling the 7-liter 
jug each time rather than filling the 5-liter jug. 
(b) Can you measure 1 liter of water using a 25-liter jug and a 17-liter jug? 
(c)t Prove that if m and n are positive coprime integers then you can measure one liter of water 
using an m liter jug and an n liter jug? 
(d) Prove that one can do this wasting less than mn liters of water. 


Exercise 1.7.15. Can you weigh 1 lb of tea using scales with 25-lb and 17-lb weights? 


The definition of a set of linear combinations can be extended to an arbitrary 
set of integers (in place of the set {a,b}); that is, 


I(ay,..., Qn) = {aymy + agmeg +--+ + apmy: M1,M2,...,M~ € Zh. 


Exercise 1.7.16. Show that I(a1,...,a,%) = I(g) for any non-zero integers a1,...,@,%, where we 
have g = gcd(a1,..., ax). 


Exercise 1.7.17.1 Deduce that if we are given integers a1,a2,...,a@,%, not all zero, then there 
exist integers m1,m2,...,™Mx such that 

mai + m2a2+---+ mpag = ged(ai,a2,..., ax). 
We say that the integers a1, a2,...,a, are relatively prime if gcd(a1,a2,...,a%) = 1. We say that 


they are pairwise coprime if gcd(aj,a;) = 1 whenever i # j. Note that 6,10,15 are relatively 
prime, but not pairwise coprime (since each pair of integers has a common factor > 1). 
Exercise 1.7.18. Prove that if g = gcd(a1,a2,...,ax), then gced(a1/g,a2/g,...,an/g) = 1. 
Exercise 1.7.19.1 (a) Prove that abc = [a, b, c] - gcd (ab, be, ca). 


(b)? Prove that if r+ s =n, then 


a1°++@n =|lem [[@: Ic {l,...,n}, |Z] =r] - ged I] a: JC {1,...,n}, |J) =s 
tel jed 


44 1. The Euclidean algorithm 


Throughout this book we will present more challenging exercises in the final 
part of each chapter. If some of the questions are part of a consistent subject, then 
they will be presented as a separate subsection: 


Divisors in recurrence sequences 


We begin by noting that for any integer d > 1 we have the polynomial identity 
(1.7.1) xt _ y@ = (a _— y)(at—4 + gi 2y os oe ay? a ye), 

Hence if r and s are integers, then r — s divides r? — s¢. (This also follows from 
Corollary 2.3.1] in the next chapter.) 


Exercise 1.7.20. (a) Prove that if m|n, then 2” — 1 divides 2” — 1. 
b)? Prove that ifn = qm+r with 0 <r <m-—1, then there exists an integer Q such that 
q s' 
27 -1=Q(2™—1)+(2"—1) (and note that 0< 2"-1< 2-1). 
(c)t Use the Euclidean algorithm to show that ged(2” — 1,27 —1) = 28cd(r™) _ 1, 
(d) What is the value of ged(N® — 1, N° — 1) for arbitrary integer N 4 —1,0, or 1? 


In exercise [0.4.15(a) we saw that the Mersenne numbers M,, = 2" — 1 (of the 
previous exercise) are an example of a second-order linear recurrence sequence. We 
will show that an analogous result holds for any second-order linear recurrence 
sequence that begins 0,1,.... For the rest of this section we assume that a and b 
are coprime integers with zo = 0, x; = 1 and that 2, = axty,_1 + bry_2 for all 
n> 2. 


Exercise 1.7.21. Use exercise[0.4.10{a) to show that gcd(tm, tn) = gcd(@m,%@m+1Xn—m) when- 
ever n > ™m. 


Exercise 1.7.22.’ Prove that if m|n, then wm|an; that is, {zn : n > 0} is a division sequence. 


Exercise 1.7.23.' Assume that (a,b) = 1. 

(a) Prove that gcd(an,b) = 1 for all n > 1. 

(b) Prove that gced(an,%n—1) = 1 for alln > 1. 

(c) Prove that ifn > m, then (an,%m) = (fn—m,&Lm)- 
) Deduce that (an, am) = 2(n,m)- 


Exercise 1.7.24.’ For any given integer d > 2, let m = mg be the smallest positive integer for 
which d divides 2. Prove that d divides x, if and only if mg divides n. 


It is sometimes possible to reverse the direction in the defining recurrence re- 
lation for a recurrence sequence; that is, if b = 1, then (0.1.2) can be rewritten as 
In—-2 = —AX%yn_1 + Lp. SO if v) =O and x; = 1, then r_; = 1,7_2 = —a,.... We 
now try to understand the terms x_p. 

Exercise 1.7.25. Let us suppose that a, = a%n—1 + %n—2 for all integers n, both positive and 


negative, with zo = 0 and x1 = 1. Prove, by induction on n > 1, that r_» = (—1)"~12,, for all 
ne 2. 


Appendix 1A. Reformulating 
the Euclidean algorithm 


In section [1.5] we saw that the Euclidean algorithm may be usefully reformulated 
in terms of continued fractions. In this appendix we reformulate the Euclidean 
algorithm in two further ways: firstly, in terms of matrix multiplication, which 
makes many of the calculations easier; and secondly, in terms of a dynamical system, 
which will be useful later when we develop similar ideas in a more general context. 


1.8. Euclid matrices and Euclid’s algorithm 


In discussing the Euclidean algorithm we showed that gcd(85,48) = gcd(48, 37) 
from noting that 85 — 1-48 = 37. In this we changed our attention from the 
pair 85,48 to the pair 48,37. Writing this down using matrices, we performed this 


change via the map 
85 ed 48\ /0 1 85 
48 37) \1 —-1/) \48)° 


Next we went from the pair 48,37 to the pair 37,11 via the map 


(7) = (i) =(@ 2) Ge) 


and then from the pair 37,11 to the pair 11,4 via the map 


(i) >) = 5) (x) 


We can compose these maps so that 


(is) > (2) > GH) = 4) ) = GA) 4) G) 


46 Appendix 1A. Reformulating the Euclidean algorithm 


and then 


(is) °C) =G 2) (t)=@ 3)-G A) G 4) (@): 


Continuing on to the end of the Euclidean algorithm, via 11 = 2-4+3,4=1-3-+1, 
and 3 = 3-1+0, we have 


(= G 2)G AG 4)G 4)G 4) G 4) (e): 


Since 6 : ) (; ? =I for any x, we can invert to obtain 


; (3) -(" 
HC IC AIG IG JG JG 0): 


Here we used that the inverse of a product of matrices is the product of the inverses 
of those matrices, in reverse order. If we write 


{2 6 
wn(* 8) 


where a, 3,7, 0 are integers (since the set of integer matrices are closed under mul- 
tiplication), then 


where 


ad — By = det M = (-1)® = 1, 


since M is the product of six matrices, each of determinant —1, and the determinant 
of the product of matrices equals the product of the determinants. Now 


85\ 1\ fa B\/1)\_ fa 
(is) =" (0) = 5) (0) =) 
so that a = 85 and y = 48. This implies that 


855 — 48 B = 1; 


that is, the matrix method gives us the solution to (£.2.1) without extra effort. 
If we multiply the matrices defining M together in order, we obtain the sequence 
1 1 1 1\/1 1) /2 1 2 1\/3 1\_ /7 2 
1 07’ \1 O/ \1 OF V1 17? \1 17s \1 OF \4 1 
and then 
16 7 23 16 85 23 
9 4)’ \13 9/7’ \48 138/77 


We notice that the columns give us the numerators and denominators of the con- 
vergents of the continued fraction for 85/48, as discussed in section [I.5] 


1.9. Euclid matrices and ideal transformations 47 


We can generalize this discussion to formally explain the Euclidean algorithm: 


Let up = a> uy := b> 1. Given uj; > ujqi = 1: 


Let a; = [u;/uj+41], an integer > 1. 

e Let ujp2 = uj — ajuj41 so that 0 < ujyo < uj4i — 1. 

e If uj+2 =0, then g := ged(a, b) = uj4i, and terminate the algorithm. 
e Otherwise, repeat these steps with the new pair uj+1, uj+2. 


The first two steps work by Lemma[I.1.1] the third by exercise [1.3] We end up 
with the continued fraction 


a/b = [ao,a1,..-, Gg 


for some k > 0. The convergents p;/q; = [ao,a1,...,a,;] are most easily calculated 
by matrix arithmetic as 


Pj Pj-1 _ {ao 1 ay 1 a; 1 
aa G i - € i) € 7) a € 0 
so that a/g = p, and b/g = q,, where g =gcd(a, b). 


Exercise 1.8.1. Prove that this description of the Euclidean algorithm really works. 


Exercise 1.8.2. (a) Show that pjqj—1 — pj—19; = (—1)3*? for all j > 0. 
(b) Explain how to use the Euclidean algorithm, along with (1.8.1), to determine, for given 
positive integers a and 6, an integer solution u,v to the equation au + bv = gcd(a, b). 


Exercise 1.8.3. With the notation as above, show that [ax,...,a0] = a/c for some integer c for 
which 0 < c < a and bec = (—1)* (mod a). 


Exercise 1.8.4. Prove that for every n > 1 we have 
Fatt Fn \_ (1 1\" 
Fy Fr-i) \1 oO} ’ 
where F,, is the nth Fibonacci number. 
My favorite open question in this area is Zaremba’s conjecture: He conjectured 
that there is an integer B > 1 such that for every integer n > 2 there exists a 
fraction m/n, where m is an integer, 1 < m < n—1, coprime with n, for which 


the continued fraction m/n = [ao,a1,...,@%] has each partial quotient a, < B. 
Calculations suggest one can take B = 5. 


1.9. Euclid matrices and ideal transformations 


In section[L.3]we used Euclid’s algorithm to transform the basis of the ideal [(85, 48) 
to I(48, 37), and so on, until we showed that it equals [(1,0) = I(1). The transfor- 
mation rested on the identity 


85m + 48n = 48m! + 37n’, where m’ =m-+nandn’ =n; 


a transformation we can write as 


(mn) (msn!) = (mn) (F 9) 


48 Appendix 1A. Reformulating the Euclidean algorithm 


The transformation of linear forms can then be seen by 


48 1 1 48 85 
48m! + 37n! = (m',n’) a — (m,n) é i) (3) = (m,n) ey = 85m + 48n. 


The inverse map can be found simply by inverting the matrix: 


(mi) +(e) = (A) Ge): 


These linear transformations can be composed by multiplying the relevant matrices, 
which are the same matrices that arise in the previous section, section [1.8] For 
example, after three steps, the change is 


2 
nisi =tw) (; i) 
so that 11lm3 + 4n3 = 85m + 48n. 


Exercise 1.9.1. (a) With the notation of section establish that xu; + yuj+1 = ma+ nb 
where the variables x and y are obtained from the variables m and n by a linear transfor- 
mation. 

(b) Deduce that I(u;,uj+1) = I(a,b) for 7 =0,...,k. 


1.10. The dynamics of the Euclidean algorithm 


We now explain a dynamical perspective on the Euclidean algorithm, by focusing on 
each individual transformation of the pair of numbers with which we work. In our 
example, we began with the pair of numbers (85,48), subtracted the smaller from 
the larger to get (37,48), and then swapped the order to obtain (48,37). Now we 
begin with the fraction x := 85/48; the first step transforms x > y := x—1 = 37/48, 
and the second transforms y > 1/y = 48/37. The Euclidean algorithm can easily 
be broken down into a series of steps of this form: 

85 37 — 48 11 37-26 15 4 

> > > > > > > 

48 48 37 337 11 11 11 11 
11 7 3 4 1 3 2 1 0 
z a 4 3 3 TF 7 az tT 
It is possible that the map x — x — 1 is repeated several times consecutively (for 
example, as we went from 37/11 to 4/11), the number of times corresponding to the 
quotient, [a]. On the other hand, the map y > 1/y is not immediately repeated, 
since repeating this map sends y back to y, which corresponds to swapping the 
order of a pair of numbers twice, sending the pair back to their original order. 


> 


These two linear maps correspond to our matrix transformations: 


1 -l 37 1 —1)\ /85 
x — x —1 corresponds to a 0 ) , so that ce a 6 0 ) f) : 


01 48 OT) (3h 
and y + 1/y corresponds to (; 3) , So that & = (; 3) i : 


The Euclidean algorithm is therefore a series of transformations of the form 7 > 
x —1 andy — 1/y and defines a finite sequence of these transformations that 
begins with any given positive rational number and ends with 0. One can invert 


1.10. The dynamics of the Euclidean algorithm A9 


that sequence of transformations, to transformations of the form « > «+1 and 
y — 1/y, to begin with 0 and to end at any given rational number. 


Determinant 1 transformations. Foreshadowing later results, it is more useful 
to develop a variant on the Euclidean algorithm in which the matrices of all of the 
transformations have determinant 1. To begin with, we break each transformation 
down into the two steps: 


e Beginning with the pair 85, 48 the first step is to subtract 1 times 48 from 85, 
and in general we subtract q times b from a. This transformation is therefore 
given by 


_ = -4q 
(5) > e a (5) , and notice that ({ i. — € i) i 


e The second step swaps the roles of 37(= 85 — 48) and 48, corresponding to 
a matrix of determinant —1. Here we do something unintuitive which is to 
change 48 to —48, so that the matrix has determinant 1: 


37 = 0 —-1)\ (37 d wy (@ 0 -1)\/a 
48 1 0 4g): and more generally |,] >|, 4 ae 


One then sees that if g = gcd(a,b) and a/b = [ao,..., ax], then 


C= i) Go)G i) ~G o)6 1) 


We write S := ( i) and T := ( iy i Taking inverses here we get 


il 
0 1 —-1 0 


(5) = S©TS"7T... 9% 75% (°) 
b g 


If a and b are coprime, then this implies that 


(1.10.1) SOTSNT ...S%1T 5% = é : 

for some integers c and d. The left-hand side is the product of determinant one 
matrices, and so the right-hand side also has determinant one; that is, cb— ad = 1. 
This is therefore an element of SL(2,Z), the subgroup (under multiplication) of 
2-by-2 integer matrices of determinant one; more specifically 


SL(2, Z) ={( Ae o,f, 7,0 © Z, ad — by=1}. 


Theorem 1.2. Each matrix in SL(2,Z) can be represented as S2 TS... Se T Ir 
for integers e1, fi,..-,€r; fr- 


: € SL(2,Z). Taking determinants we 
see that ba —ay = 1. Therefore gcd(a, b) = 1, and so above we saw how to construct 
an element of SL(2,Z) with the same last column. In Theorem [3.5] we will show 


Proof. Suppose that we are given _ 


50 Appendix 1A. Reformulating the Euclidean algorithm 


that every other integer solution to bx — ay = 1 is given by x = c— ma, y = d— mb 
for some integer m. Therefore 


G )=( 8) Ce 4) 


One can easily verify that 
_1_/0 -1 -1onp_f 1 0 
T eC 0 , so that TST = -1 1)? 


( : ') (Popa 7 ae 


—m 1 


and therefore 


Combining these last two statements together with (1.10.1) completes the proof of 
the theorem. 


Appendix 1B. Computational 
aspects of the Euclidean 
algorithm 


In this appendix we study the speed at which the Euclidean algorithm works. First 
we look for simple ways to speed the algorithm up. Secondly we establish how to 
formulate a practical way to measure how fast an algorithm works and identify and 
analyze the running time of the worst case scenario for the Euclidean algorithm. 


1.11. Speeding up the Euclidean algorithm 


There are various simple steps that can be used to speed up the Euclidean algorithm. 
For example if one of the two initial numbers is odd, then we know that the gcd is 
odd, so we can divide out any factor of 2 that we come across while implementing 
the algorithm|[!4] Hence in our favorite example, since 48 = 24-3 and 85 is odd, we 
have 


(85,48) > (85,3) > (1,3) =1 


which is much simpler. Another popular option is to allow minus signs: In each 
step of the Euclidean algorithm we replace a and 8, by b and r, the remainder when 
a is divided by 6. If r is “large”, that is, r is close to b, then we can replace r by r—b 
(which is negative) or, ignoring the minus sign, by b—r. Hence as 85— 2-48 = —11, 
we have 

(85,48) > (48,11) > (11,4) > (4,1) = 1. 


Again this is faster than the usual Euclidean algorithm. In practice one tries to 
combine these two ideas, and typically this speeds up Euclid’s algorithm by a factor 
of 2 or more. 


MThis is particularly easy if the numbers are represented in base 2, as on a computer, since then 
one can simply remove any trailing 0’s. 


52 Appendix 1B. Computational aspects of the Euclidean algorithm 


1.12. Euclid’s algorithm works in “polynomial time” 


In section of appendix 1A we gave notation for Euclid’s algorithm, and we 
now write Un, = Uj+2-n so that we take the numbers in Euclid’s algorithm in 
reverse order. Thus vo = 0 and vy = g. Now each a; > 1, which implies that 
Uj = GjUj41 + Ujye = Uj41 + Uj+2, and therefore 


Vit2 2 Vig $V; 
for all i > 0. 


Exercise 1.12.1. (a) Prove that if F, is the nth Fibonacci number, then vn > Fy for all n. 
(b)? Show that this inequality cannot be improved, in general. 
(c) Show that if we apply the usual Euclid algorithm to a > b > 1, then it terminates in 


< ees +2 steps, where ¢ = 16 | 


It is important to determine whether a given algorithm is practical, and to do 
that we figure out how many steps that algorithm takes. In exercise|1.12.1| we have 
just seen that the Euclidean algorithm on two integers a > b > 1 takes < c loga 
steps|*)| Each step of the algorithm invokes Lemma [L.1.1] and finding q and r can 
be done in < c loga bit operations|*4| Hence in total the Euclidean algorithm works 
in < c(loga)? bit operations. 

We want to decide whether or not this means that the Euclidean algorithm is 
efficient. Any algorithm computes some function f, which maps a given input to 
some value. If the input is n, then to write it down, say in binary, takes c log n bits. 
Hence we cannot expect an algorithm to work any quicker than this. A polynomial 
time algorithm is one that, given an input of d binary digits, computes the output 
in time < cd“ bit operations, for some contants c > 0 and A > 0. A polynomial 
time algorithm is considered to be an efficient algorithm. The Euclidean algorithm 
is a polynomial time algorithm (with A = 2) and therefore is efficient. 


Exercise 1.12.2. Find two coprime four-digit integers a and b for which the Euclidean algorithm 
works in (a) as few steps as possible and (b) as many steps as possible. 


In exercise [L.12.1] particularly part (b), we have focused on the “worst case”, 
the slowest possible example in Euclid’s algorithm. It is more instructive, but 
more difficult, to study what “typically happens”. This is explained by the Gauss- 
Kuzmin law: For a typical fraction a/b, the proportion of partial quotients in the 
continued fraction of a/b which equal k is roughly" 


1 (K+1)?\_ 
log? 8 (Fes) 


so a typical continued fraction has roughly 41.5% 1’s, 17% 2’s, 9.3% 3’s, 5.9% 4’s, 
4% 5’s, etc., and has length roughly co log a where co = 4 log 2 = 0.8427.... This 


15Here, and henceforth, c stands for some positive constant. We do not try to specify its value 
since that would be complicated and is not really the point. The point is to understand how the running 
time of an algorithm varies with the size of the input. The value of c might even be different from one 
line to the next, not to confuse the reader but simply to reiterate that its exact value is irrelevant. 

16 A bit operation is the addition, subtraction, or shifting of two bits, each 0 or 1, in computer 
machine code. 

17The words “typical” and “roughly” here are deliberately vague, as correctly formulating this 
result would require a substantial amount of notation. Our formulation gives the flavor of the correct 
result. 


1.12. Euclid’s algorithm works in “polynomial time” 53 


is not much smaller than the upper bound that we obtained for the worst case in 
exercise [L.12.i{c). (The constant there is 1/log = 2.0780. ...) 


A careful analysis of the Gauss-Kuzmin law suggests that typically, the largest 
partial quotient in the continued fraction of a/N, with 1 < a < N —1, is about 
4G log N. It also suggests Zaremba’s 1971 conjecture that, for every integer N > 1, 
the partial quotient of the continued fraction of at least one of the a/N is < 5. This 
is an open problem. The best result we have was given by Bourgain and Kontorovich 
in 2014: They showed that the partial quotients of the continued fraction of at least 
one of the a/N are all < 50, for 100% of integers NEY 


In this subsection we have not been too clear about what base we have used 
for our logarithms (nor does it matter since we have been working with ratios 
of logarithms, which take the same value, no matter what base). In advanced 
mathematics we typically work with the natural logarithm, which is the logarithm 
to the base e, where 

. 1 1 | 1 | 1 | 1 | 1 fees, 4 
Cr a a ay ey 
All logarithms in this book are natural logarithms. There is good reason for this: 
Typically this is first justified by noting that e* is the only function f(x) for which 
dz = J, or perhaps because, when optimizing compound interest, one learns that 
(1+ 4)" > eas N - co. We shall see a number-theoretic justification when we 
study the distribution of primes in chapter 5. 


18Beware, this does not mean all N, nor “all N from some point onwards”. In fact, “100% of 
integers N” means that the proportion of integers for which this fails tends to 0 as N > oo. 


Appendix 1C. Magic squares 


1.13. Turtle power 


The only 3-by-3 (normal) magic square up to isomorphism 


2/9 /|4 
7/513 
6/118 


was given to mankind by a turtle from the river Lo, in the days of the legendary 
Emperor Yu of China (~ 2200 B.C.). It is magic because it is a square array of 
distinct integers in which all rows, columns, and main diagonals have the same sum. 


By the 12th century, the Jaina in India 
had found a 4-by-4 normal magic square: 


7 {12 ] 1 =| 14 


16 | 3 | 10} 5 
9 | 6 |15 | 4 


Albrecht Diirer engraved Melencolia I in 
1514 (part of which we display on the 
left), including a 4-by-4 magic square 
containing the numbers 15 and 14 (which 
therefore gave the date) in the center of 
the bottom row. Each row, column, and 
diagonal sums to 34, and also the inner 
square of integers and the four corners. 
Any two symmetrically placed integers, 
around the center, sum to 17. 


54 


1.14. Latin squares 55 


A magic square is an n-by-n square array, in which the rows and columns all 
sum up to the same amount. All 3-by-3 magic squares may be parametrized in 
terms of 5 variables, 71, 22, 73, a, b: 


L1 ta+b |az3+a 
23+6b |a,+a r2 
tata | 2x3 z1+b 


Figure 1.2. The general magic square of order 3. 


A magic square is normal if it contains each of the integers between 1 and n?. 
One can construct normal magic squares of every order n > 3. However, we do not 
know how many there are of each order. 


Some seemingly new magic squares are obtained from old ones by a simple 
transformation: For example, we can rotate the turtle’s magic square by 90°, or we 
can reflect it though a horizontal line, to obtain 


6 | 7 | 2 4|9 | 2 
1}5|9 and 3/5 | 7 
8/3 |4 8} 1/6 


In order to make progress with the problem of classifying magic squares we will 
write the integers in the turtle’s magic square, minus 1, in base 3. Similarly we 
write the Jaina magic square in base 4: 


12 | 23 | 00 | 31 
Ol | 22 | 10 O1 | 30 | 13 | 22 
20 | 11 | 02 and 33 | 02 | 21 | 10 
12 | 00 | 21 20 | 11 | 32 | 03 


In both cases it becomes obvious that all of the row and column sums must be the 
same, since each possible digit appears exactly once in each row and column. 


Exercise 1.13.1. Prove that if A is a magic square, then mA + n is also a magic square for any 
integers m,n, where (mA + n);,; = mAj,j +7 for all 4, 7. 


1.14. Latin squares 


A Latin square is an n-by-n square of integers, in which each row and each column 


contains each of the numbers {0,1,...,2— 1} exactly once. Thus we have the 
following 3-by-3 Latin squares: 

0/;2)1 1|2)0 

2111/0 ana of1/2 

1|0)2 2/0/1 


Three times the entries of the first plus the entries of the second plus 1 gives our 
original 3-by-3 magic square. Replacing 0,1, 2 by x1, x2, x3 in the first and by 6,0,a 
in the second and adding, we obtain the general magic square in Figure 1.2. 

In general if we are given two n-by-n Latin squares A and B, then we can 
construct an n-by-n magic square M by the formula 


M5 => nAj,; + Bij + 1. 


56 Appendix 1C. Magic squares 


Since 0 < Ajj, Bij <n—1, therefore 1 < Mi; < n?. Moreover, for each fixed j, 


n-1 n-1 
1 
TMs = eA + Dat = wee Teen = Jott, 
a k=0 k=0 
since each ee of A eet of B contains each of {0,1,...,2— 1} exactly once. 


The analogous argument shows that each row sum of MW takes the same value. 


In order that M is normal, the values of n.A;,;-+ B;,; must all be different, which 
is the same as saying that the pairs (A;,;, B;,;) are all different. Any two n-by-n 
Latin squares A and B in which no two positions have the same A- and B-values 
are called orthogonal, as in our example above. 


Exercise 1.14.1. For a given integer n > 1, let (£)n be the least non-negative residue of @ 
(mod n). 
(a) Show that if (ab,n) = 1 and Aj; = (ai+ bj)n, then A is an n-by-n Latin square. 
(b) Show that if, also, (cd,n) = (ad — bc,n) = 1 and By; = (ci +dj)n, then A and B are 
orthogonal n-by-n Latin squares. 
(c) Prove that there exist integers a,b,c,d for which (abcd, n) = (ad — bc,n) = 1 if and only if 
n is odd. 
(d) Deduce that there are n-by-n normal magic squares whenever n is odd and > 1. 


It is an open question to count the number of pairs of orthogonal n-by-n Latin 
squares. However it is known that ifn = 10 or 14 or any number of the form 44+ 10 
with k > 0, then there is at least one pair of orthogonal n-by-n Latin squares and 
therefore the corresponding magic square. 


1.15. Factoring magic squares 


Let A be an n-by-n normal, magic square, and let B(r, s) be m-by-m normal, magic 
squares for 0 < r,s <n—1 (it could be that each B(r,s) = B). We construct an 
mn-by-mn magic square C as follows: 


Cu-1)m+t(j-pmes = (Aig —1) + BS) 
forl<igjg<nand1<IJ,J<m. 


Exercise 1.15.1. (a) Verify that C is a normal magic square. 
(b) Suppose that we have constructed a normal magic square of order 8. Deduce that there are 
normal magic squares of order n, whenever n is divisible by 4. 


Exercise 1.15.2. Let n = 4m and define O = {0,...,m — 1} U {3m,...,4m — 1} whereas 
I={m,...,3m—1}. We define an n-by-n magic square A with (i, 7)th entry, for 0 < i,j <n-—1, 


Ali,3) in+jg+1 ifi,j7 El ori,j EO,” 
4 — 
2 (n-—1-i)n+(n-1-j)+1 iftelandje€O, orifice Oandj el. 


Prove that A is a normal n-by-n magic square in which any two symmetrically placed integers, 
around the center, sum to n? + 1. 


Reference for this appendix 


[1] Harold M. Stark, chapter 4 of An introduction to number theory, MIT Press, Cambridge, Mass.- 
London, 1978. 


There are many websites with constructions of magic and Latin squares. 


Appendix 1D. The Frobenius 
postage stamp problem 


1.16. The Frobenius postage stamp problem, | 


The Frobenius postage stamp problem asks, given an unlimited supply of postage 
stamps worth a cents and b cents, what (precise) amounts of postage can we stick 
on an envelope? That is, what is the set of values 


P(a,b) := {am+bn: m,n € Z, m,n > 0}? 


(To obtain am + bn cents we take m stamps worth a cents and n stamps worth b 
cents.) We only allow non-negative coefficients for a and 6 in our linear combina- 
tions, whereas in I(a, b) there is no such restriction. 

We will begin with the example in which we have stamps of value 3 and 5 cents: 
Evidently we can make up any multiple of 3 cents and any multiple of 5 cents. But 
how about with a mixture of different types of stamps? When we start trying all 
possible combinations we find 


3=1-3; 5=1-5; 6=2-3; 8=1-34+1-5; 9=3-3; 10=2-5; 11 =2-3+41-5; 
and it seems that from here on we get every integer value: 
12 = 4-3; 138 =1-342-5; 14=3-341-5; 15 = 5-3; 16 = 2-342-5; 17 = 4-341-5. 


To prove that we indeed get every integer value we study these representations and 
look for a pattern. A quick inspection and we see that if we wish to pay n cents 
postage and we already know how to pay n— 3 cents, then we simply add a 3-cent 
stamp. Hence once we know how to pay exactly 8 cents, exactly 9 cents, and exactly 
10 cents, then we can deduce that every integer n > 8 belongs to P(3, 5). 


Exercise 1.16.1. (a) Rewrite this last proof as a formal proof by induction to establish that 
{neéZ: n>8}C P(3,5). 
(b) Suppose that 1 < a < b. Assume there exists an integer N (which may depend on a and b), 
for which N,N + 1,...,N +a—1 all belong to P(a,b). Deduce that {n € Z: n>N}C 
P(a, b). 


57 


58 Appendix 1D. The Frobenius postage stamp problem 


In section B.25] of appendix 3G, we will develop a technique for establishing that this assumption 
holds for some integer N whenever gcd(a, b) = 1. 


We have now proven that P(3,5) = {n >0: n41,2,4, or 7}. 

What is the connection between P(a,b) and I(a,b)? First we note that if 
r = am-+o0n, then —r = a(—m) + b(—n), so that I(a,b) > P(a,b) U —P(a,d). 
In our example above this means that (3,5) D {n EZ: n 1,+2,+4,+7}. 
It is not difficult to establish that +1,+2,+4,+7 € I(3,5), by constructing linear 
combinations like 1 = 2-3-—1-5, and so [(3,5) = Z. 

There is an easier way to show that [(3,5) = Z: We know that there are two 
consecutive integers in P(a,b), for example 13 = 1-3+2-5 and 14=3-3+41-5. 
Subtracting one from the other we have 

fd Sd) Ce Se 1B, 
Multiplying through by n, we can write any integer n as (2n)-3—n-5 and so 
1(3,5) = Z. 

We proved above that P(3,5) = {n > 0} \ €(3,5) where €(3, 5) := {1, 2,4, 7} is 
the set of exceptional numbers (that is, those non-negative integers not of the form 
3m+5n with m,n > 0). We notice that €(3, 5) is a finite set and that if r € €(3,5), 
then r < 3-5 and gcd(r,3-5) = 1. These properties, and more, hold for any €(a, b), 


where a and 0 are coprime integers, as we will see in sections of appendix 3G 
and of appendix 15B. 


Exercise 1.16.2. (a) Prove that €(2,3) = {1}. 


(b) An integer n is a power if n = m* for some integers m and k > 2. Prove that we can write 
243 


any power n asn =a where a and b are integers. 


Exercise 1.16.3. (a) Construct €(a,b) for various pairs of coprime integers a and b. 
(b) Guess at a formula (in terms of a and b) for the largest element of E(a, b), that is, the largest 
positive integer not representable in the form am + bn with m,n > 0. (Use your data from 
part (a).) 
(c)? Prove that your conjectural formula in (b) is true. 


Appendix 1E. Egyptian 
fractions 


1.17. Simple fractions 


The ancient Egyptians represented all fractions, a/b with (a,b) = 1, as a sum of 
distinct fractions of the form 1/n. Such a representation is called an Egyptian 
fraction, and each 1/n is a unit fraction. For example if n is odd and n+ 1 = 2m, 
then 


nm m mn 


Exercise 1.17.1. Deduce that 1/b with b > 1 may always be written as 1/m-+1/n with m and 
n distinct positive integers. 


We have shown how to write a/b with (a,b) = 1 as asum of unit fractions when 
a=1anda=2. How about when b > a > 3? 


Exercise 1.17.2. Suppose that b > a > 2 are positive coprime integers. Let q = [b/a]. 
(a) Prove that a/b = 1/(q+1)+ A/B where A and B are positive integers with A < a, B. 
(b) Deduce that a/b can be written as a sum of no more than a distinct unit fractions. 


For a = 1 and a = 2 we have shown that a/b can be written as the sum of at 
most a distinct unit fractions (and this is trivially best possible). In exercise 
we have shown that a/b can always be written as the sum of at most a distinct unit 
fractions, but is this best possible for larger a? We will restrict our attention to 
prime values of b, for if prime p divides b and a/p = = 4 1/n;, a sum of k distinct 
unit fractions, then a/b = sar 1/m;, where each m,; = n;(b/p), which is also a 
sum of & distinct unit fractions. 

For a = 3 one can show that 3/7 cannot be written as the sum of two distinct 
unit fractions, and indeed 3/p cannot be written as the sum of two distinct unit 
fractions whenever p is a prime of the form 3m +1. (We will prove this in section 
[3.26] of appendix 3G.) Hence 3/b can always be written as the sum of at most 3 


59 


60 Appendix 1E. Egyptian fractions 


distinct unit fractions, and this is best possible. One can do better if b is of the 
form 3m — 1, or even of the form am — 1: 


a 1 1 


am-1 m™ 7 m(am — 1) 


For a = 4 one can show that 4/13 cannot be written as the sum of two distinct 
unit fractions, nor any 4/p whenever p is a prime of the form 4m + 1 (see exer- 
cise [3.26.1). Therefore the minimum number of distinct unit fractions needed to 
represent all unit fractions of the form 4/0 is either 3 or 4. But which is it? The 
Erdés-Strauss conjecture states that 4/n can always be written as the sum of three 
unit fractions or less. Although this is an open problem, it is known to be true for 
all n < 10". 


The representation above shows how to represent 4/p whenever p is a prime of 
the form 4m — 1, and also whenever p is a prime of the form 3n — 1 by adding 1/p 
to each side of the representation there for 3/p. This leaves us to represent 4/p as 
a sum of at most three unit fractions, only for the primes p of the form 12k + 1. 


a a 
Chapter 2 


Congruences 


The key step in understanding the Euclidean algorithm, Lemma[LL1) shows that 
gcd(a,b) equals gcd(r,b), because b divides a — r. Inspired by how useful this 
observation is, Gauss developed the theory of when two given integers, like a and 
r, differ by a multiple of b: 


2.1. Basic congruences 
If m, b, and c are integers for which m divides b — c, then we write 
b=c (mod m) 


and say that b and c are congruent modulo m, where m is the modulus[] The 
numbers involved should be integers, not fractions, and the modulus can be taken 
in absolute value; that is, b = c (mod m) if and only if b = c (mod |ml), by 
definition. 

For example, —10 = 15 (mod 5), and —7 = 15 (mod 11), but —7 # 15 
(mod 3). Note that b= b (mod m) for all integers m and b. 


The integers = a (mod m) are precisely those of the form a+km where k is an 
integer, that is, a,a+m,a+2m,...as well as a—m,a—2m,a—3m,.... We call 
this set of integers a congruence class or residue class mod m, and any particular 
element of the congruence class is a residue/| 


For any given integers a and m > 0, there exists a unique pair of integers q 
and r with 0 < r < m-—1, for which a= qm+r, by Lemma[L.LJ] Therefore there 
exists a unique integer r € {0,1,2,...,m—1} for which a =r (mod m). Moreover, 
if two integers are congruent mod m, then they leave the same remainder, r, when 


1Gauss proposed the symbol = because of the analogies between equality and congruence, which 
we will soon encounter. To avoid ambiguity he made a minor distinction by adding the extra bar. 

?The sequence of numbers a,a + m,a-+ 2m,..., in which we add m to the last number in the 
sequence to obtain the next one, is an arithmetic progression. 


61 


62 2. Congruences 


divided by m. We now prove a generalization of these last remarks: 


Theorem 2.1. Suppose that m is a positive integer. Exactly one of any m consec- 
utive integers is =a (mod m). 


Two proofs[}| Suppose we have the m consecutive integers z,x+1,...,~7-+m-—1. 


Analytic proof: An integer n in the range x < n < «+ mis of the form a+ km, 
for some integer k, if and only if there exists an integer k for which 


r<atkm<a+m. 


Subtracting a from each term here and dividing through by m, we find that this 
holds if and only if 


L-a L-Ga 

<k< 
m m 

Hence k must be an integer from an interval of length one which has just one 

endpoint included in the interval. Such an integer k exists and is unique; it is the 


smallest integer that is > =—*. 


+1. 


Exercise 2.1.1. Prove that for any real number t there is a unique integer in the interval [¢,t+1). 


Number-theoretic proof: By Lemmafi.1.1]there exist integers q and r with 0 < r < 
m—1, for which a— x =qm+r, withO<r<m-1. Thenz<a24+r<24+m-1 
and «+r =a-—qm=a (mod m), and so x +r is the integer that we are looking 
for. We still need to prove that it is unique: 

If +7 =a (mod m) and «+ j = a (mod m), where 0 <i <j < m—1, 
then i = a— ax = j (mod ™m), so that m divides 7 — 7, which is impossible as 
1<j-i<m-1l. 


Exercise 2.1.2. Prove that m divides (n — 1)(n — 2)---(n — m) for every integer n and every 
integer m > 1. 


Theorem [2.1] implies that any m consecutive integers yield a complete set of 
residues (mod m); that is, every congruence class (mod m) is represented by exactly 
one element of the given set of m integers. For example, every integer has a unique 
residue amongst 


the least non-negative residues (mod m): 0, 1, 2, ...,(m—1), 
as well as amongst 
the least positive residues (mod m): 1, 2, ...,m, 


and also amongst 


the least negative residues (mod m) : (m—1), -(m—-2), ..., —2, -1, 0. 


For example, 2 is the least positive residue of —13 (mod 5), whereas —3 is the least 
negative residue; and 5 is its own least positive residue mod 7, whereas —2 is the 
least negative residue. Notice that if the residue is not = 0 (mod m), then these 
residues occur in pairs, one positive and the other negative, and at least one of each 


3Why give two proofs? Throughout this book we will frequently take the opportunity to give more 
than one proof of a key result. The idea is to highlight different aspects of the theory that are, or will 
become, of interest. Here we find both an analytic proof (meaning that we focus on the size or quantity 
of the objects involved) as well as a number-theoretic proof (in which we use their algebraic properties). 
Sometimes the interplay between these two perspectives can take us much further than either one alone. 


2.1. Basic congruences 63 


pair is < m/2 in absolute value. We call this the absolutely least residue (mod m) 
(and we select m/2, rather than —m/2, when m is even) |4| For example if m = 5, 
we can pair up the least positive residues and the least negative residues as 


=-—4 (mod 5), 2=-—3 (mod 5), 3=-—2 (mod 5), 4=-—1 (mod 5), 


as well as the exceptional 5 = 0 (mod 5). Hence the absolutely least residues 
(mod 5) are —2,—1,0,1,2. Similarly the the absolutely least residues (mod 6) are 
—2,-1,0,1,2,3. More generally if m = 2k 4+ 1 is odd, then the absolutely least 
residues (mod 2k +1) are —k,...,—1,0,1...,k; and if m = 2k is even, then the 
absolutely least residues (mod 2k) are —(k—1),...,—1,0,1...,k. 


We defined a complete set of residues to be any set of representatives for the 
residue classes mod m, one for each residue class. A reduced set of residues has 
representatives only for the residue classes that are coprime with m. For example 
{0,1,2,3,4,5} is a complete set of residues (mod 6), whereas {1,5} is a reduced set 
of residues, as 0, 2, and 4 are divisible by 2, and 0 and 3 are divisible by 3 and so 
are excluded. 


Exercise 2.1.3. Suppose that a1,...,@m is a complete set of residues mod m. Prove that m 
divides (n — a1)---(n — am) for every integer n. 
Exercise 2.1.4. (a) Explain how “a number of the form 3n — 1” means the same thing as “a 
number of the form 3n + 2”, using the language of congruences. 
(b) Prove that the set of integers in the congruence class a (mod d) can be partitioned into the 
set of integers in the congruence classes a (mod kd), a+d (mod kd),... and a+ (k—1)d 
(mod kd). 


Exercise 2.1.5. Show that if a= b (mod m), then (a,m) = (b,m). 
Exercise 2.1.6. Prove that if a= b (mod m), then a= b (mod 4d) for any divisor d of m. 
Exercise 2.1.7. Satisfy yourself that addition and multiplication mod m are commutative) 


Exercise 2.1.8. Prove that the property of congruence modulo m is an equivalence relation on 
the integers. To prove this, one must establish 


(i) a=a (mod m); 
(ii) a = b (mod m) implies b = a (mod m); 
=b 


(iii) a (mod m) and b=c (mod m) imply a=c (mod m). 


The equivalence classes are therefore the congruence classes mod m. 


One consequence of this is that integers that are congruent modulo m have the 
same least residues modulo m, whereas integers that are not congruent modulo m 
have different least residues. 


The main use of congruences is that it simplifies arithmetic when we are looking 
into questions about remainders. This is because the usual rules for addition, 
subtraction, and multiplication work for congruences. However, division is a little 
more complicated, as we will see in the next section. 


4This is often called the least residue in absolute value. 

5A mathematical operation is commutative if you get the same result no matter what order you take 
the input variables in. Thus, in C, we have x + y= y+ a and ry = yx. There are common operations 
that are not commutative; for example a — b # b—a in C, unless a = b. Moreover multiplication 
in different settings might not be commutative, for example when we multiply 2-by-2 matrices, as we 
discovered, in detail, in section [0.12]of appendix OD. 


64 2. Congruences 


Lemma 2.1.1. Ifa=b (mod m) andc=d (mod m), then 


at+c=b+d (modm), 
a—c=b—d (modm), 


and ac=bd (mod m). 
Proof. By hypothesis there exist integers wu and v for which a — b = um and 
c—d=vm. Therefore 
(a+c)— (64+ d) = (a—b)+ (c—d) =um+um=(utvu)m 
so that a+c=b+d (mod m); 
(a —c) — (b—d) = (a—b) — (c— d) =um—vm= (u-v)m 
so that a— c=b-—d (mod m); and 


ac — bd = a(c— d) + d(a— b) =a-um+d-um = (av + du)m 


so that ac = bd (mod m). 


These are the rules of modular arithmetic. 


Exercise 2.1.9. Under the hypothesis of Lemma show that ka+lc = kb+Id (mod m) for 
any integers k and I. 


Exercise 2.1.10. If p|jm and m/p = a (mod q), then prove that m = ap (mod q). 


2.2. The trouble with division 


Although the rules for addition, subtraction, and multiplication work for congru- 
ences as they do for the integers, reals, and most other mathematical objects we 
have encountered, the rule for division is more subtle. In the complex numbers, if 
we are given numbers a and b ¥# 0, then there exists a unique value of c for which 
a = bc (so that c = a/b), and therefore there is no ambiguity in the definition of 
division. We now look at the multiplication tables mod 5 and mod 6 to see whether 
this same property holds for modular arithmetic: 


x 0 1 2 3 4 
0 0 0 0 0 0 
1 0 1 2 3 4 
2 0 2 4 1 3 
3 0 3 1 4 2 
A 0 4 3 2 1 


The multiplication table (mod 5). 


Other than in the top row, we see that every congruence class mod 5 appears 
exactly once in each row of the table. For example, in the row corresponding to the 
multiples of 2, mod 5 we have 0, 2, 4, 1, 3, which implies that for each a (mod 5) 


2.2. The trouble with division 65 


there exists a unique value of c (mod 5) for which a = 2c (mod 5); that is, c= a/2 
(mod 5). We read off 


0/2=0 (mod 5), 1/2=3 (mod 5), 2/2=1 (mod 5), 
3/2=4 (mod 5), and 4/2=2 (mod 5), 


each division leading to a unique value. This is true in each row, so for every non- 
zero value of b (mod 5) and every a (mod 5), there exists a unique multiple of b, 
which equals a mod 5. Therefore division is well- (and uniquely) defined modulo 5. 


However, the multiplication table mod 6 looks rather different. 


x 0 1 2 3 4 5 
0 0 0 0 0 0 0 
1 0 1 2 3 4 5 
2 0 2 4 0 2 4 
3 0 3 0 3 0 3 
4 0 4 2 0 4 2 
5 0 5 4 3 2 1 


The multiplication table (mod 6). 


The row corresponding to the multiples of 5, mod 6, is 0, 5, 4, 3, 2, 1, so that 
each b/5 (mod 6) is well-defined. 


However, the row corresponding to the multiples of 2, mod 6, reads 0, 2, 4, 0, 2, 4. 
There is no solution to 1/2 (mod 6). On the other hand, for something as simple 
as 4/2 (mod 6), there are two different solutions: 5 (mod 6) as well as 2 (mod 6). 
Evidently it is more complicated to understand division mod 6 than mod 5. 


We can obtain a hint of what is going on by applying exercise [2.1.4] which 
implies that the union of the sets of integers in the two arithmetic progressions 5 
(mod 6) and 2 (mod 6) gives exactly the integers = 2 (mod 3). So we now have a 
unique solution to 4/2 (mod 6), albeit a congurence class belonging to a different 
modulus. 


Exercise 2.2.1. Determine one congruence class which gives all solutions to 3 divided by 3 
(mod 6). (In other words, find a congruence class a (mod m) such that 3x = 3 (mod 6) if and 
only if z =a (mod m).) 


These issues with division arise when we try to solve equations by division: If we 
divide each side of 8 = 2 (mod 6) by 2, we obtain the incorrect “4 = 1 (mod 6)”. 
We can correct this by dividing the modulus through by 2 also, so as to obtain 
4 = 1 (mod 3). Even this is not the whole story, for if we wish to divide both 
sides of 21 = 6 (mod 5) through by 3, we cannot also divide the modulus, since 3 
does not divide 5. However, in this case one does not need to divide the modulus 
through by 3, since 7 = 2 (mod 5). So what is the general rule? We shall resolve 
all of these issues in Lemma[B.5.1] after we have developed a little more theory. 


66 2. Congruences 


2.3. Congruences for polynomials 


Let Z[a] denote the set of polynomials with integer coefficients. Using the above 
rules for congruences, one gets a very useful result for congruences involving poly- 
nomials: 


Corollary 2.3.1. If f(x) € Z[z] and a= b (mod m), then f(a) = f(b) (mod m). 


Proof. Since a = b (mod m) we have a? = b? (mod m) by Lemma[2.1.]] and then 


Exercise 2.3.1. Prove that a* = b* (mod m) for all integers k > 1, by induction. 


Now, writing f(z) = Se fix’ where each f; is an integer, we have 


d d 
f(a) => fia’ = S0 fib! = f(b) (mod m), 
1=0 i=0 


by Lemma|2.1.1 


This result can be extended to polynomials in many variables. 
Exercise 2.3.2. Deduce, from Corollary that if f(t) € Z[t] and r,s € Z, then r— s divides 
f(r) — f(s). 


Therefore, for any polynomial f(«) € Z[ax], the sequence f(0), f(1), f(2),... 
modulo m is periodic of period m; that is, the values repeat every mth term in the 
sequence, repeating indefinitely. More precisely f(n +m) = f(n) (mod m) for all 
integers n. 


Example. If f(x) = x? — 8x + 6 and m = 5, then we get the sequence 

f(0), f(1),... = 1,4,3,4,3,1,4,3,4,3,1... 
and the first five terms 1, 4,3, 4,3 repeat infinitely often. Moreover we get the same 
pattern if we run though the consecutive negative integer values for x. 


Note that in this example f(z) is never 0 or 2 (mod 5). Thus none of the 
equations 


a — 82 +6=0, y® —8y+1=0, and z>—8z2+4=0 
can have solutions in integers x, y, or z. 


Exercise 2.3.3. Let f(x) € Z[z]. Suppose that f(r) #0 (mod m) for all integers r in the range 
0<r<m-—1. Deduce that there does not exist an integer n for which f(n) = 0. 


2.4. Tests for divisibility 


There are easy tests for divisibility based on ideas from this chapter. For example, 
writing an integer in decimal a 


a+10b4+100c+-:--, 


°More precisely, en a; 10° where each a; is an integer in {0,1,2,...,9} and ag # 0. Why did 
we write the decimal expansion so informally in the text, when surely good mathematics is all about 
precision? While good mathematics is anchored by precision, mathematical writing also requires good 
communication—after all why shouldn’t the reader understand with as little effort as possible?—and so 
we attempt to explain accurately with as little notation as possible. 


2.4. Tests for divisibility 67 


we employ Corollary 2.3.1] with f(2) = a+ bx + cx? +--+. and m = 9, so that 
a+106+100c+---= f(10)= f(1)=a+b+c+-:- (mod 9). 


Therefore we can test whether the integer a+ 106+ 100c+--- is divisible by 9 by 
testing whether the much smaller integer a+b+c+--- is divisible by 9. In other 
words, if an integer is written in decimal notation, then it is divisible by 9 if and 
only if the sum of its digits is divisible by 9. This same test works for divisibility by 
3 (by exercise 2.1.6) since 3 divides 9. For example, to decide whether 7361842509 
is divisible by 9, we need only decide whether 7+3+6+1+8+4+2+5+0+9 = 45 
is divisible by 9, and this holds if and only if 4+ 5 = 9 is divisible by 9, which it 
obviously is. 

One can test for divisibility by 11 in a similar way: Since 10 = —1 (mod 11), 
we deduce that f(10) = f(—1) (mod 11) from Corollary [2.3.1] and so 


a+106+100c+---=a—b+c:-: (mod 11). 


Therefore 7361842509 is divisible by 11 if and only if 7-3+6—1+8—4+2-—5+0-9 = 
1 is divisible by 11, which it is not. 


One may determine similar (but slightly more complicated) rules to test for 
divisibility by any integer, though we will need to develop our theory of congruences. 
We return to this theme in section 7.7. 


Exercise 2.4.1. (a) Invent tests for divisibility by 2 and 5 (easy). 

(b) Invent tests for divisibility by 7 and 13 (similar to the above). 
(c)t Create one test that tests for divisibility by 7, 11, and 13 simultaneously (assuming that 
one knows about the divisibility by 7, 11, and 13 of every non-negative integer up to 1000). 


Additional exercises 


Exercise 2.5.1. Prove that if a, b, and c are integers and d = b? — 4ac, then d= 0 or 1 (mod 4). 
Exercise 2.5.2. Prove that if N = a? — b?, then either N is odd or N is divisible by 4. 


Exercise 2.5.3. (a) Prove that 2 divides n(3n + 101) for every integer n. 
(b) Prove that 3 divides n(2n + 1)(n+ 10) for every integer n. 
(c) Prove that 5 divides n(n + 1)(2n + 1)(3n + 1)(4n + 1) for every integer n. 


Exercise 2.5.4. (a) Prove that, for any given integer k > 1, exactly k of any km consecutive 
integers is = a (mod m). 
(b)? Let I be an interval of length N. Prove that the number of integers in IJ that are = a 
(mod m) is between N/m —1 and N/m +1. 
(c) By considering the number of even integers in (0,2) and then in [0, 2], show that (b) cannot 
be improved, in general. 


Exercise 2.5.5. The Universal Product Code (that is, the bar code used to identify items in the 
supermarket) has 12 digits, each between 0 and 9, which we denote by d1,...,di2. The first 11 
digits identify the product. The 12th is chosen to be the least residue of 


3d1 — dg + 3d3 — dg —---— dig + 3di1 (mod 10). 
(a) Deduce that di + 3d2 + dg +---+di1 + 3d12 is divisible by 10. 
(b) Deduce that if the scanner does not read all the digits correctly, then either the sum in (a) 
will not be divisible by 10 or the scanner has misread at least two digits. 


Exercise 2.5.6. (a) Take f(x) = x? in Corollary to determine the squares modulo m, 
for m = 3,4,5,6,7,8,9, and 10. (“The squares modulo m” are those congruence classes 
(mod m) that are equivalent to the square of at least one congruence class (mod m).) 


68 2. Congruences 


) Show that there are no solutions in integers x, y,z to x? + y? = 2? with a and y odd. 
(c) Show that if x? + y? = z?, then 3 divides zy. 
(d) Show that there are no solutions in integers x,y,z to a? + y? = 3z? with (a,y) = 1. 
) 
) 


Show that there are no solutions in integers x, y, z to x? + y? = 666z? with (a, y) = 1. 
Prove that no integer = 7 (mod 8) can be written as the sum of three squares of integers. 


Exercise 2.5.7.1 Show that if 23 + y? = z3, then 7 divides xyz. 


Binomial coefficients modulo p 


We will assume that p is prime for all of the next two sections. 


Exercise 2.5.8. Use the formula for (?) given in (0.3.1) to prove that p divides (?) for all integers 
j in the range 1 < j < p—1. This implies that *(F) is an integer. 


For 1 < j < p—1 we can write Gy *) as PtP? ... a There is considerable 


cancelation when we reduce this latter expression mod p. 
Exercise 2.5.9. (a) Prove that rae =(-1)) (mod p) for all j, O< jf < p—1. 
(b) Prove that 3() = (—1)9-1/j (mod p) for all j, 1< j <p-—1. 


Exercise 2.5.10.1 (a) Prove that ($”) = ($) (mod p) whenever a,b > 0. 
P 
(b) Prove that (au) = ({) - (@) (mod p) whenever 0 < c,d < p—1. (Remember that ($) = 0 
ife<d.) 
(c) Ifm = mo + mip 4 map? +---+mpp* and n = no + nip+---+ngp® are non-negative 
integers written in base p, deduce Lucas’s Theorem (by induction on k > 0), that 


CG ee 


One can extend the notion of congruences to polynomials with integer coef- 
ficients: For f(x), g(a) € Zia] we have f(x) = g(a) (mod m) if and only if there 
exists a polynomial h(x) € Z[a] for which f(x) — g(a) = mh(x). This notion can 
be extended even further to polynomials in several variables. 


The binomial theorem for n = 3 gives 
(o9+y)? = 2° +307y + Say? +y'. 
Notice that the two middle coefficients here are both 3, and so 
(x+y =a3+y* (mod 3). 
Similarly 
(a +y)° = 2° + Baty + 10r3y? + 1027y? + 5ayt*+y®=a2°+y°> (mod 5), 


since all four of the middle coefficients are divisible by 5. This does not generalize 
to all exponents n, for example for n = 4 we have (2 + y)* = a4 + 2a7y? + y4 
(mod 4) which is not congruent to x+ + y* (mod 4), but the above does generalize 
to all prime exponents, as we will see in the next exercise. 


Exercise 2.5.11. Deduce from exercise that (x+y)? = x? + y? (mod p) for all primes rl 


Exercise 2.5.12. Prove that (x + y)?—! = aP—1 — yaP-2 4... — xyP-? + yP-! (mod p). 


“This is sometimes called the freshman’s dream or the child’s binomial theorem, sarcastically referring 
to the unfortunately common mistaken belief that this works over C, rather than the more complicated 
binomial theorem, as in section 


The Fibonacci numbers modulo d 69 


Exercise 2.5.13. Prove that (« +y)P" = 4?" $y?" (mod p) for all primes p and integers k > 1. 


Exercise 2.5.14. (a) Writing a positive integer n = no + nip+ nap* +--- in base p, use 


exercise [2.5.13]to prove that 
(a@+y)” = (a+ y)"(aP + yP)™! (xP” + yP )r2.+. (mod p). 


(b)* Reprove Lucas’s Theorem (as in exercise 2.5.10(c)) by studying the coefficient of 2’ y"—™ 
in (a). 


Exercise 2.5.15. (a) Prove that (x+y 


yP =aP + y? +2? (mod p). 
(b) Deduce that (#1 + %2 +--+ + an)? 


P 


| 
x -+ a2), (mod p) for all n > 2. 


The Fibonacci numbers modulo d 


The Fibonacci numbers mod 2 are 
0, 1,1, 0) 1,-1,,0) 1, Tyee 


We see that the Fibonacci numbers modulo 2 are periodic of period 3. The Fi- 
bonacci numbers mod 3 are 


O11 2 1, OD 1 8, Dy ey, 


and so seem to be periodic of period 8. In exercise [1.7.24] we defined m = mq to 
be the smallest positive integer for which d divides F,,, and showed that d divides 
F,, if and only if mg divides n. In our two cases we therefore have mz = 3 which is 
the period and m3 = 4 which is half the period. 


In the next exercise we show that Fibonacci numbers (and other such sequences) 
are periodic mod d, for every integer d > 1, by using the pigeonhole principle. This 
states that if one puts N + 1 letters into N pigeonholes, then, no matter how one 
does this, some pigeonhole will contain at least two letters/§ 


Exercise 2.5.16. (a) Prove that the pigeonhole principle is true. 
We will now show that the Mersenne numbers M,, := 2” — 1 are periodic mod d. 
(b) Show that there exist two integers in the range 0 < r < s < d for which M,; = Ms; (mod d). 
(c) In exercise 0.4.15(b) we saw that the Mersenne numbers satisfy the recurrence M,41 = 
2Mn +1. Use this to show that M,4; = Ms4; (mod d) for all j > 0. 
(d) Deduce that there exists a positive integer p = pg, which is < d, such that Mnip = Mn 
(mod d) for all n > d. That is, Mn is eventually periodic mod d with period pg < d. 


An analogous proof works for general second-order linear recurrence sequences, 
including Fibonacci numbers. For the rest of this section, we suppose a and b are 
integers and {a,,: n > 0} is the second-order linear recurrence sequence given by 


Ln = AXn—1 + bXy_2 for all n > 2 with xo = 0 and xz; = 1. 


Exercise 2.5.17. (a) By using the pigeonhole principle creatively, prove that there exist two 
integers in the range 0 < r < s < d? for which 2 = as (mod d) and 2,41 = 2541 (mod d). 
(b) Use the recurrence for the a to show that «4; = «s4; (mod d) for all j > 0. 
(c) Deduce that the x, are eventually periodic mod d with period < d?. 
(d) Prove that mg divides the period mod d. 


°In French, this is the “principle of the drawers”. What invocative metaphors are used to describe 
this principle in other languages? 


70 2. Congruences 


We saw above that the Fibonacci numbers mod 3 have period 3?—1, and further 
calculations reveal that the period mod d never seems to be larger than d? — 1, a 
small improvement over the bound that we obtained in exercise [2.5.17(c). In the 
next exercise we see how to obtain this bound, in general. 


Exercise 2.5.18. (a) Show that if there exists a positive integer r for which x, = v,41 = 0 
(mod d), then z, = 0 (mod d) for all n > r so that the zp are eventually periodic mod d 
with period 1. 
(b) Now assume that there does not exist a positive integer r for which 2, = z,41 = 0 (mod d). 
Modify the proof of exercise2.5.17|to prove that the x, are eventually periodic mod d with 
period < d? —1. 


It is possible to get a more precise understanding of the Fibonacci numbers and 
other second-order recurrences, mod d: 


Exercise 2.5.19. In order to understand z,, (mod d), we take m = mg in the results of this 
exercise. 

(a) Prove, by induction, that tm4z~ = %m+1v~ (mod 2) for all k > 0. 

(b) Deduce the same result from exercise[0.4.10 

(c) Deduce that ifn = qm+r with O<r<m-—1, then zy = (am41)!a, (mod am). 


We will return to this result in chapter 7 where we study the powers mod n. 


In exercise [0.1.5] we saw the importance of the discriminant) A := a? + 4b of 
the quadratic polynomial x? — az — b. The rule for the x, mod A is a little easier: 
Exercise 2.5.20. Prove by induction that 

(a) v4 = ka(—b)*-+ (mod A) and w2441 
(b) vo, = kab®-1 (mod a?) and xop41 = b 


(2k + 1)(—b)* (mod A) for all k > 0 and 
(mod a?) for all k > 0. 


Ew 


Exercise 2.5.21. Suppose that the sequence (un)n>1 satisfies a dth-order linear recurrence (as 
defined in appendix OB). Prove that for any integer m > 1, the un are eventually periodic mod 
m with period < m@ — 1. (We prove that this bound is best possible when m is prime in exercise 


7.25.5)) 


©The colon “:” plays many roles in the grammar of mathematics. Here it means that “Henceforth 
we define A to be ....” 


Appendix 2A. Congruences 
in the language of groups 


2.6. Further discussion of the basic notion of congruence 


Congruences can be rephrased in the language of groups. The integers, Z, form a 
eroup|4 in which addition is the group operation. In exercise 0.11.1 of appendix 
OD we proved that the non-trivial, proper subgroups of Z all take the form mZ := 
{mn: n € Z} for some integer m > 1, that is, the set of integers divisible by m. 
The congruence classes (mod m) are simply the cosets of mZ inside Z: 


O0+mZ, 14+mZ, 2+mZ,..., (m—1)4+mZ, 


where 

j4tmZ :={j+mn: ne Z}, 
which is the set of integers belonging to the congruence class 7 (mod m). Notice 
that the m cosets of mZ are disjoint and their union gives all of Z. 


The group operation on Z, namely addition, is inherited by the cosets of mZ. 
For example, as 7+ 11 = 18 in Z, the same is true when we add together the 
relevant cosets of mZ in Z; in other words 


(7+ mZ)+(11+mZ) = (184+ mZ). 
This new additive group is the quotient group 
Z/mZ. 


This is the beginning of the theory of quotient groups, which we develop in the 
next section. 


1°See appendix OD for a discussion of the basic properties of groups. 

11Throughout, we define the sum of two given sets A and B to be A+ B:= {a+b: a€ A,bvE B}, 
that is, the set of elements that can be represented as a+b with a € A and b € B. Note that an element 
may be represented more than once. 


71 


72 Appendix 2A. Congruences in the language of groups 


The reader should be aware that multiplication mod m (and, in particular, 
how its properties are inherited from Z) does not fit into this discussion of additive 
quotient groups. 


2.7. Cosets of an additive group 


Suppose that H is a subgroup of an additive (and so abeliar{!?)) group G. A coset 
of H in G is given by the set 


a+H:={ath: he H}. 


In Proposition 2.7.1] we will show, as in the example mZ of the previous section, 
that the cosets of H are all disjoint and their union gives G. 


The quotient group G/H has as its elements the distinct cosets a + H and 
inherits its group law from G, in this case addition, so that 


(a+ H)+(b4+ H) =(a+0)4+ 4H. 


Proposition 2.7.1. Let H be a subgroup of an additive group G. The cosets 
of H in G are disjoint, so that the elements of G/H are well-defined; and the 
addition law on G/H is also well-defined. If G is finite, then |H| divides |G| and 
|G/H| = |G|/|H]. 


Proof. If a+ H and 6+ H have a common element c, then there exists hj, hz € H 
such that a +h, = c = b+ hg. Therefore b = a+ h, — ho = a+ ho where 
ho = hi — hz € H since H is a group (and therefore closed under addition). Now if 
h € H, then b+h =a+t+(ho +h) € a+H4H, as ho+h € H, so that b+ H C a+H, and 
by the analogous argument a+ H C b+ H. We deduce that a+ H = b+ 4H. Hence 
the cosets of H are either identical or disjoint, which means that they partition G; 
therefore if G is finite, then |H]| divides |G]. 


This also implies that if c€ a+ H, then c+ H =a+H. We wish to show that 
addition in G/H is well-defined. If a+ H, b+ H are cosets of H, then we defined 
(a+H)+(b+H) = (a+b)+H, so we need to verify that the sum of the two cosets 
does not depend on the choice of representatives of the cosets. So, ifc € a+ H and 
d € b+ H, then there exists hy, hg € H for which c=a+h, andd=b+hg. Then 
c+ H =a+H4H andd+H =b+4H. Moreover c+d=a+b+(hi the) €a+b+H, 
as H is closed under addition, and soc+d+H =a+b+4H, as desired. Hence 
G/H is well-defined, and |G/H| = |G|/|H| when G is finite. 


Example. Z is a subgroup of the additive group R, and the cosets a+ Z are given 
by all real numbers r that differ from a by an integer. Every coset a+Z has exactly 
one representative in any given interval of length 1, in particular the interval [0, 1) 
where the coset representative is {a}, the fractional part of a. These cosets are 
well-defined under addition and yield the quotient group R/Z. 

The exponential mape: RU := {z €C: |z| = 1}, from the real numbers to 
the unit circle, is defined by e(t) = e?’**. Since e(1) = 1, therefore e(n) = e(1)” = 1 
for every integer n. Therefore if b € a+ Z so that b = a+ n for some integer n, 
then e(b) = e(a+n) = e(a)e(n) = e(a), so the value of e(t) depends only what 


12 group G is called abelian or commutative if ab = ba for all elements a,b € G. 


2.9. The order of an element 73 


coset t belongs to in R/Z. Therefore we can think of the exponential map as the 
concatenation of two maps: firstly the natural quotient map from R > R/Z (that 
is, a + a+Z) and then the map e: R/Z — U. Picking the representatives [0, 1) 
for R/Z, we see that the restricted map e: [0,1) > U is 1-to-1. 


By a slight abuse of terminology, we let a = b (mod 1), for real numbers a and 
b, if and only if a and b belong to the same coset of R/Z. 


Exercise 2.7.1. Prove that a = b (mod m) if and only if a/m and b/m belong to the same coset 
of R/Z. 


Exercise 2.7.2. (a) Prove that t = {t} (mod 1) for all real numbers t. 
(b) Prove that the usual rules of addition, subtraction, and multiplication hold mod 1. 
(c) Show that division is not always well-defined mod 1, by finding a counterexample. 


2.8. A new family of rings and fields 


We have seen, in Lemma[2.1.1] that the congruence classes mod m support both 
an additive and multiplicative structure. 


Exercise 2.8.1. Prove that Z/mZ is a ring for all integers m > 2. 


To be a field, all the non-zero congruence classes of Z/mZ would need to have a 
multiplicative inverse, but this is not the case for all m. For example we claim that 
3 does not have a multiplicative inverse mod 15. If it did, say 3m = 1 (mod 15), 
then multiplying through by 5 we obtain 5 =5-1=5-3m=0 (mod 15), which is 
evidently untrue. 


We call 3 and 5 zero divisors since they non-trivially divide 0 in Z/15Z. 


Exercise 2.8.2. (a) Prove that if m is a composite integer > 1, then Z/mZ has zero divisors. 
(b) Prove that Z/mZ is not a field whenever m is a composite integer > 1. 
(c) Prove that if R is any ring with zero divisors, then R cannot be a field. 


An integral domain is a ring with no zero divisors. Note that Z is an integral 
domain (hence the name) but is not a field. 


If R is a commutative ring and m € R, then mR is an additive subgroup of 
R, and the cosets of mR support a multiplicative structure. To see this, note that 
ifs €a+mR and y € b+ mR, then & = a+mr, and y = b+ mrz for some 
1,72 € R, and so zy = ab+mr where r = arg + bry + mryr2g which belongs to 
R, as R is closed under both addition and multiplication. That is, ry € ab+ mR. 
Hence R/mR inherits the multiplicative and distributive properties of R, as well 
as the identity element 1+ mR; and so R/mR is itself a commutative ring. 


2.9. The order of an element 


If g is an element of a given group G, we define the order of g to be the smallest 
integer n > 1 for which g” = 1, where 1 is the identity element of G. If n does not 
exist, then we say that g has infinite order (for example, 1 in the additive group Z). 
We shall explore the multiplicative order of a reduced residue mod m, in detail, in 
chapter 7. 


There is a beautiful observation of Lagrange which restricts the possible order 
of an element in any finite abelian group. 


74 Appendix 2A. Congruences in the language of groups 


Theorem 2.2 (Lagrange). If G is a finite abelian group, then the order of any 
element g of G divides |G|, the number of elements in G. Moreover, gil =1. 


Proof. Suppose that g has order n and let H := {1,9,g7,...,g" +}, a subgroup of 
G of order n. By Proposition 2.7.1] we deduce that n = |H| divides |G|. Moreover 
if |G| = mn, then gl@l = g™™ = (g")™ =1™ =1. 


Lagrange’s Theorem actually holds for any finite group, non-abelian as well as 
abelian, as we will see in Corollary [7.23.1] of appendix 7D. 


Appendix 2B. The Euclidean 
algorithm for polynomials 


We use the Euclidean algorithm to find the greatest common divisor of two inte- 
gers. There are sets of mathematical objects, other than the integers, in which an 
analogy to the Euclidean algorithm works, but to do so, there must be notions anal- 
ogous to “divisor” and “greatest”. Moreover we know that the Euclidean algorithm 
terminates in finitely many steps because there are only finitely many positive in- 
tegers smaller than a given integer; the analogous statement is not true for the real 
numbers or the rationals. However, one can use the Euclidean algorithm to find 
the highest degree common polynomial factor of two given polynomials. 


2.10. The Euclidean algorithm in C[z] 


Non-zero polynomials take the form 


d 


f(x) = ag tayu+---+agx" where the leading coefficient ag 4 0, 


for some integer d > 0, where the a;’s belong to some set of mathematical objects, 
for example, C, Q, or Z. We call agx% the leading term. It is often possible to 
assume, without loss of generality, that the polynomials involved are monic (that 
is, dq = 1), since f(a) has exactly the same roots as cf (a), for any non-zero constant 
c (so we can divide f(x) through by its leading coefficient). 

Now suppose that f(x), g(a) € C[a]. We say that “g(a) divides f(x)” if there 
exists a polynomial h(x) € C[x], for which f(x) = g(x)h(x). If f(x) A 0, then 
deg f = deg g + deg h; in particular, deg f > deg g. The following is the polynomial 
analogy to Lemma [1.1.1] 


x) € Clz], where D := deg(g) > 1. 
Cla] for which 


f(x) = a(2)g(2) + r(2), 


Proposition 2.10.1. Suppose that f(x), g( 
There exist unique polynomials q(x), r(x) € 


where r(x) has degree < D. 


76 Appendix 2B. The Euclidean algorithm for polynomials 


Proof. We prove the existence of r by induction on the degree of f. Suppose 
that g(x) has leading term bpx? with bp 4 0. If deg(f) < D, then let q(x) = 
0 and r(x) = f(x). Otherwise, suppose that f(a) has leading term agx¢ with 
d> D and let F(x) := f(x) — (aa/bp)x*? g(x). The leading terms of f(x) and 
(aa/bp)x*—? g(x) are both agx4, and so F has lower degree than f. By induction 
we then know that there exists a polynomial Q(x) for which r(a) = F(x)—Q(x)g(x) 
has degree < D, and the result follows taking q(2) = Q(x) + (aa/bp)x* . 


To prove that r is unique, suppose that we also have f(x) = Q(x)g(x) + R(z) 
where R(a) has degree < D. Then 


(r— R)(x) = (F(@) — a(@)9(@)) — (F(@) — Q(@)g(@)) = (Q— g(a) 9(); 


that is, g(a) divides (r — R)(a). Now deg(r — R) < max{deg(r), deg(R)} < D= 
deg g, and sor — R=0. Therefore R= rand Q=4q. 


Exercise 2.10.1. Prove that if a € C, then x — a is a factor of f(a) if and only if f(a) =0. 


The greatest common divisor of f(a) and g(x) is the monic polynomial h(x) of 
highest degree which divides both f(x) and g(x). We denote this by 


ged( f(x), 9(@))eja}- 
One can develop a theory analogous to the integer case; for example, one can show 
that any common divisor of f and g must divide h(x). 
We now describe the Euclidean algorithm: Given f(x), g(a) € C[z] for which 
deg(f) > deg(g) we appeal to Proposition [2.10.1] so that there exists g(x) € C[a] 
for which r(a) = f(x) — q(a@)g(x) has degree < D. We claim that 


ecd( f(x), 9(x))cj2} = gcd(9(x), r(2))cfey- 
To prove this, note that if P(x) divides both g(x) and r(x), then it divides the 
linear combination q(z)g(x) +r(x) = f(x). Alternatively if P(x) divides both f(z) 
and g(x), then it divides the linear combination f(a) — q(x)g(x) = r(a). 

We deduce that the Euclidean algorithm in C[a] does indeed yield the greatest 
common divisor, since the sum of the degrees of the polynomials involved reduce 
by at least 1 at each step. 

Exercise 2.10.2. Suppose the Euclidean algorithm gives the polynomials Fo = f, Fi = g, Fo, F3, 


. Prove that, for all k > 0, there exist a,(x), bg (x) € C[x] for which F(x) = ag(x) f(x) + 
bp (a) g(a). 


In exercise [2.10.2] we proved there exist a(x), b(2) € C[x] for which 


a(x) f(x) + W(a)g(x) = sed( f(x), 9(@) ejay, 


which we will denote by h(a). We would like to bound the degrees of the polynomials 
a(x) and b(a) (just as we controlled the sizes of the integers involved in exercise 
[..7.10). By Proposition 2.10.1] there exists q(x) € C[z] for which A(x) := a(x) — 
q(a)g(x) has degree < deg(g). Let B(x) := b(x) + q(x) f(x), so that 


A(x) f(a) + Bla)g(@) = a() f(x) + b(@) g(a) = h(a). 


Then deg(B(x)g(x)) < max{deg(A(x) f(x)),deg(h(x))} < deg(f) + deg(g), and 
therefore deg(A) < deg(g) and deg(B) < deg(f). 


2.11. Common factors over rings: Resultants and discriminants 77 


Exercise 2.10.3. (a) Show that A and B (with these degree bounds) are unique, up to a scalar 
multiple. 
(b) Write f = hF and g = hG where h = (f,g). Prove that all solutions of a(x) f(x)+b(x)g(a) = 
h(a) are given by a= A+kG and b= B—KkF for any k(az) € C[z]. 


All of the arguments in this section work for the polynomials whose coefficients 
come from any field, in place of C[z], for example R[z] or Q[z]. In other words, we 
can state that for any two given f(x), g(x) € Q[z] there exist a(x), b(x) € Q[az] with 
deg(a) < deg(g) and deg(b) < deg(f), for which 


gcd( f(x), 9(2) ofa) = a(x) f(a) + b(x)g(@). 


One interesting consequence of this construction is that if f(x), g(a) € Q{a' 
have a common root in C, then they have a common polynomial factor in Q[z]. 


Exercise 2.10.4. Suppose that f(x), g(x) € Z[a]. Prove that gcd(f(x), 9(x))cj2] = 1 if and only 
if gcd(f (x), 9(x) Qiu] = 1. 
Exercise 2.10.5. (a) Explain why the proof of Proposition 2.10.1 works in any field in place 
of C. 
(b) Prove that the result holds with f(x), g(x) € Z[z], whenever g is monic. 
(c) When f(x), g(x) € Z[x] and g(x) has leading coefficient c 4 0, show that the result follows 
with “f (ce) = a(z)g(«) + r(e)” replaced by “el+des f—des ¢ f(r) = g(x)g() + r(c)”. 


2.11. Common factors over rings: Resultants and discriminants 


Now suppose that f(x), g(x) € Z[z] have no common polynomial factors in Z[:)]. 
We just saw that there exist A(x), B(x) € Q[z] with deg(A) < deg(g) and deg(B) < 
deg(f), for which 


A(x) f(@) + B(x)g() = 1. 
We multiply A(z) and B(x) through by the least common multiple of the denomi- 
nators of their coefficients and then divide the resulting polynomials through by the 


greatest common divisor of their numerators. We therefore obtain a(x), b(a) € Z[a] 
with deg(a) < deg(g) and deg(b) < deg(f), for which 


a(x) f(a) + b(x)g(x) = R, 
where R € Z[x]NQ=Z. We call R the resultant of f and g4| 


A particularly interesting case is when g(x) = f’(a). Then f and g have no 
common root if and only if f has no repeated factor; that is, f(x) is the product 
of distinct linear polynomials over Cz]. The resultant of f and f’ is called the 
discriminant of f. For example, if f(x) = ax* + br +c, then f’(x) = 2ax + b. 
Applying the Euclidean algorithm, 


2f(x)—af'(x) = bx + 2c, and then 
2a(bx + 2c) — bf’(z) = —A, where A := b? — 4ac. 


13This definition is only guaranteed to agree with the usual definition in algebra books when f 
and g are both monic. For the non-monic case, the algebra books tend to be misleading. 


78 Appendix 2B. The Euclidean algorithm for polynomials 


This yields that the discriminant 
A :=b?—4ac = —4a(ax? + br +c) + (2axr + b)?. 


We have seen, in section [0.8] that the general cubic can be taken to be of the form 
f(z) = 2° +axr+b, so that f’(x) = 327 +a. Therefore the discriminant is 


A := —4a? — 276 = 9(2ax — 3b) f(x) — (6ax? — 9bx 4 4a”) f' (2). 


2.12. Euclidean domains 


To have (the analogy to) the Euclidean algorithm in a given integral domain R we 
need something like Lemma [L.1.J] and Proposition 2.10.1} that is, when we divide 
a by non-zero 6 in R, then the remainder must be “smaller” than b. To be precise, 
R is a Euclidean domain if there exists a function w : R — Zo such that: 


For any a,b € R with b 4 0, there exists q € R such that if r := a — qb, then 
w(r) < w(0). 


Moreover we need that w(r) = 0 only if r = 0. 

When R = Z we had w(n) = |n|; and when R = C[z] we had w(f) = deg f. 
Exercise 2.12.1. Prove that the Euclidean algorithm works in a Euclidean domain. 

Let R = Zi] and w(a+bi) = |a+bi|. To study divisibility by 8 = a+bi 4 0 we 


move from the usual basis 1,7 for the complex plane over R, to the basis a+ bi, b— ai 
(= —i(a+ bi)). Therefore if a = x + yi, then 


(2.12.1) a= X(a+ bi) — Y(b— at) where x = aX — bY and y= bX +aY, 


and so we can determine X and Y from given values of x and y by inverting this 
pair of simultaneous equations. (X and Y are real but not necessarily integers. ) 


To divide a = 2+ yi by 8 = a+ bi 4 0, we write a as in (2.12.]) and then 
select m to be the nearest integer to X and n to be the nearest integer to Y, so 
that |m— X|, |n—Y| < $. Let g=m-+ni so that 


gb = (m+ni)(a+ bi) = m(at bi) — n(b— ar), 


and therefore 


r:= a-—qgG = (X(a+bi) —Y(b—ait)) — (m(a+ bi) — n(b— ai)) 
= (X —m)(at+ bi) + (n—Y)(b— ai), 


and so 
; . 1 oe, a ; : 
w(r) = |r| < |X —m|latbi|+|n—Y]||b—-ail < gla + bi| + 5[b— ail = ja+bi| = w(). 


We cannot get equality here or else a+ bi and b— ai = —i(a+ bi) would be parallel, 
which is obviously nonsense. Hence w(r) < w(), as desired. 


-—14+V-3 
pz 


Exercise 2.12.2. Prove that Z[w] is a Euclidean domain, where w = 


2.12. Euclidean domains 79 


Exercise 2.12.3. Suppose that R is a Euclidean domain (as defined above). Prove that for any 
a,b € R one can find g € R such that 


e g divides both a and b using the Euclidean algorithm, 
e there exists u,v € R for which au + bv = g, and 
e if d also divides both a and b, then w(d) < w(g). 


We call g the greatest common divisor of a and b, measuring size using the function w. 


Franz Lemmermeyer’s survey The Euclidean algorithm in algebraic number 
fields, Exposition. Math. 13 (1995), no. 5, 38-416, gives more information on 
Euclidean domains. 


Chapter 3 


The basic algebra 
of number theory 


A prime number is an integer n > 1 whose only positive divisors are 1 and n. Hence 
2,3,5,7,11,... are primes. An integer n > 1 is composite if it is not prime[|| 


Exercise 3.0.1. Suppose that p is a prime number. Prove that gced(p, a) = 1 if and only if p does 
not divide a. 


3.1. The Fundamental Theorem of Arithmetic 


Positive integers factor into primes, the basic building blocks out of which integers 
are made. Often, in school, one discovers this by factoring a given composite integer 
into two parts and then factoring each of those parts that are composite into two 
further parts, etc. For example 120 = 8 x 15, and then 8 = 2 x 4 and 15 =3-x 5. 
Now 2, 3, and 5 are all primes, but 4 = 2 x 2 is not. Putting this altogether gives 
120 = 2x 2x 2x 3x 5. This can be factored no further since 2, 3, and 5 are all 
primes. It is not difficult to prove that this always works: 


Exercise 3.1.1. Prove that any integer n > 1 can be factored into a product of primes. 


We can factor 120 in other ways. For example 120 = 4 x 30, and then 4 = 2 x 2 
and 30 = 5 x 6. Finally noting that 6 = 2 x 3, we eventually obtain the same 
factorization, 120 = 2 x 2 x 2 x 3 x 5, of 120 into primes, even though we arrived 
at it in a different way. No matter how you go about splitting a positive integer up 
into its factors, you will always end up with the same factorization into primes[?] If 
it is true that any two such factorizations are indeed the same and if we are given 
one factorization of n as q,---q,, then every prime factor p of n, found in any other 
way, must equal some q;. This suggests that we will need to prove Theorem [3.1] 


1Notice that 1 is neither prime nor composite, and the same is true of 0 and all negative integers. 

? Recognizing that this claim needs a proof and then supplying a proof, is one of the great achieve- 
ments of Greek mathematics. They developed an approach to mathematics which assures that theorems 
are established on a solid basis. 


81 


82 3. The basic algebra of number theory 


Theorem 3.1. If prime p divides ab, then p must divide at least one of a and b. 


We will prove this in the next subsection. The necessity of such a result was 

appreciated by ancient Greek mathematicians, who went on to show that Theorem 

[3. lis sufficient to establish that every integer has a unique factorization, as we will 

see. It is best to begin by making a simple deduction from Theorem 

Exercise 3.1.2. (a) Prove that if prime p divides a,a2---a,, then p divides a; for some j, 1 < 
ISK. 

(b) Deduce that if prime p divides q1---q, where each q is prime, then p = qj for some 

jo 1 SG SK. 


With this preparation we are ready to prove the first great theorem of number 
theory, which appears in Euclid’s “Elements” | 


Theorem 3.2 (The Fundamental Theorem of Arithmetic). Every integer n > 1 
can be written as a product of primes in a unique way (up to reordering). 


Proof. We first show that there is a factorization of n into primes and afterwards 
we will prove that it is unique. We prove this by induction on n: If n is prime, then 
we are done; since 2 and 3 are primes, this also starts our induction hypothesis. If 
n is composite, then it must have a divisor a for which 1 < a <n, and so b= n/a 
is also an integer for which 1 < b < n. Then, by the induction hypothesis, both 
a and 6 can be factored into primes, and so n = ab equals the product of these 
two factorizations. (For example, to prove the result for 1050, we note that 1050 = 
15x70. We have already obtained the factorizations of 15 and 70, namely 15 = 3x5 
and 70 = 2x 5x7, so that 1050 = 15 x 70 = (3x5) x (2x5x7) =2x3x5x5x7.) 


Now we prove that there is just one factorization for each n > 2. If this is not 
true, then let n be the smallest integer > 2 that has two distinct factorizations, 
Pip2***Pr = (192°°" ds, 


where the p; and q; are (not necessarily distinct) primes. Now prime p, divides 
M192°**ds, and so p, = q; for some j, by exercise [3.1.2(b). Reordering the q; if 
necessary we may assume that 7 = s, and if we divide through both factorizations 
by py = qs, then we have two distinct factorizations of 


n[Pp = PiPe-*Pr-1 = %142°** As—1- 
This contradicts the minimality of n unless n/p, = 1. But then n = p, is prime, 
and by the definition (of primes) it can have no other factorization. 


The Fundamental Theorem of Arithmetic states that there is a unique way to 
break down an integer into its fundamental (i.e., irreducible) parts, and so every 
integer can be viewed simply in terms of these parts (i.e., its prime factors). On 
the other hand any finite product of primes equals an integer, so there is a 1-to-1 
correspondence between positive integers and finite products of primes, allowing one 
to translate questions about integers into questions about primes and vice versa. 


3 When we write that a product of primes is “unique up to reordering” we mean that although one 
can write 12 as 2 x 2x 3 or 2x 3 X 2 or 3 X 2 X 2, we think of all of these as the same product, since 
they involve the same primes, each the same number of times, differing only in the way that we order 
the prime factors. 


3.2. Abstractions 83 


It is useful to write the factorizations of natural numbers n in a standard form, 
like 
aes ad aed Gla are 


where np, denotes the exact number of times the prime p divides n. Since n is an 
integer, each np > 0, and only finitely many of the np are non-zero. Usually we 
write down only those prime powers where n, > 1, for example 12 = 2?- 3 and 
50 = 2-57. We will write p°||n if p° is the highest power of p that divides n; thus 
5?\50 and 111|/1001. 


Our proof of the Fundamental Theorem of Arithmetic is constructive but it does 
not provide an efficient way to find the prime factors of a given integer n. Indeed 
finding efficient techniques for factoring an integer is a difficult and important 
problem, which we discuss in chapter 10/4 


In particular, the known difficulty of factoring large integers underlies the se- 
curity of the RSA cryptosystem, which is discussed in section [10.3] 


Exercise 3.1.3. (a) Prove that every natural number has a unique representation as 2m with 

k > 0 and m an odd natural number. 

(b) Show that each integer n > 3 is either divisible by 4 or has at least one odd prime factor. 

(c) An integer is squarefree if every prime in its factorization appears to the power 1. Prove that 
every non-zero integer can be written, uniquely, in the form mn? where m is a squarefree 
integer and n is a non-zero positive integer. 

(d)t Deduce that every non-zero rational number can be written, uniquely, in the form mr 
where m is a squarefree integer and r is a positive rational number. 


2 


Exercise 3.1.4. (a) Show that if all of the prime factors of an integer n are = 1 (mod m), 
then n = 1 (mod m). Deduce that if n #1 (mod m) then n has a prime factor that is # 1 
(mod m). 
(b)* Show that if all of the prime factors of an integer n are = 1 or 3 (mod 8), then n=1 or 3 
(mod 8). Prove this with 3 replaced by 5 or 7. 
(c)t Generalize this as much as you can to other moduli and other sets of congruence classes. 


3.2. Abstractions 


The ancient Greek mathematicians recognized that abstract lemmas allowed them 
to prove sophisticated theorems. For example, in the previous section we stated 
Theorem[3.1) a result whose formulation is not obviously relevant and yet was used 
to good effect. The archetypal lemma is known today as “Euclid’s Lemma”, an 
important result that first appeared in Euclid’s “Elements” (Book VII, No. 32), 
and we will see that it is even more useful than Theorem [3. 1} 


Theorem 3.3 (Euclid’s Lemma). [fc divides ab and gcd(c, a) = 1, then c must 
divide b. 


4It is easy enough to multiply together two given integers. If the integers each have 50 digits, 
then one can obtain the product in about 3,000 steps (digit-by-digit multiplications) and this can be 
accomplished within a second on a computer. On the other hand, given the 100-digit product, how do 
we factor it to find the original two 50-digit integers? Trial division is too slow ... if every atom in 
the universe were a computer as powerful as any supercomputer, then most such products would not be 
factored before the end of the universe! This is why we need more sophisticated factoring methods, and 
although the best ones known today, implemented on the best computers, can factor a 100-digit number 
in reasonable time, they are currently incapable of factoring typical 200-digit numbers. (See sections 
[10.4] and [10.6] for further discussion on this theme.) 


84 3. The basic algebra of number theory 


Proof of Euclid’s Lemma. Since gcd(c, a) = 1 there exist integers m and n such 
that cm + an = 1 by Theorem[L.1] Now c divides both ¢ and ab, so that 


c divides c-bm+ab-n=b(cem+an) =}, 


by exercise [L.1.1fc). 


This proof surprisingly uses, inexplicitly, the complicated construction from Euclid’s 
algorithm. Now that we have proved Euclid’s Lemma we proceed to 


Deduction of Theorem Suppose that prime p does not divide a (or else we 
are done), and so gced(p, a) = 1 (as seen in exercise 3.0.1). Taking c = p in Euclid’s 
Lemma, we deduce that p divides b. 


The hypothesis “gcd(c, a) = 1” in Euclid’s Lemma is necessary, as may be seen 
from the example in which 4 divides 2-6, but 4 does not divide either 2 or 6. 


Now that we have completed the proof of the Fundamental Theorem of Arith- 
metic, we are ready to develop the basic number-theoretic properties of integers/| 
We begin by noting one further important consequence of Euclid’s Lemma: 


Corollary 3.2.1. If am = bn, then a/ gcd(a,b) divides n. 


Proof. Let a/gcd(a,b) = A and b/ gcd(a,b) = B so that (A, B) = 1 by exercise 
a) and Am = Bn. Therefore A|Bn with (A,B) = 1, and so A|n by Euclid’s 
Lemma, as desired. We also observe that if we write n = Ak for some integer k, 
then m = Bn/A = Bk. 


One consequence is a simple way to determine the least common multiple of 
two integers from knowing their greatest common divisor. 


Corollary 3.2.2. For any positive integers a and b we have ab = gcd(a, b)-lem(a, b). 


Proof. By definition, there exist integers m and n for which am = bn = lcmf[a, b]. 
By Corollary B.2.1] we know that a/ ged(a, b) divides n and so L := b- a/ ged(a, b) 
divides bn = lcm[a, b]. Therefore L < lcm{a,b]. On the other hand L is a multiple 
of b, by definition, and of a, since L = a- b/gcd(a,b). Therefore L is a common 
multiple of a and b, and so L > l|cm[a,6] by the definition of lem. These two 
inequalities imply that L = lcm|a, b], and the result follows by multiplying through 
by the denominator. 


We will see an easier proof of this elegant result in exercise [3.3.2 


Exercise 3.2.1. Suppose that (a,b) = 1. Prove that if a and b both divide m, then ab divides m. 


5 However if we wish to develop the analogy of this theory for more complicated sets of numbers, 
for example the numbers of the form {a + bVd: a,be Z} for some fixed large integer d, then Euclid’s 
Lemma generalizes in a straightforward way, but the Fundamental Theorem of Arithmetic does not. We 
discuss this further in appendix 3F. 


3.3. Divisors using factorizations 85 


3.3. Divisors using factorizations 


Suppose thal] 
n= Il p’?, a= [[>”. and b= [[»”. 
p prime Pp p 
If n = ab, then 
gr2gns5ns .., — ga2ga3na5,., 9b2 303 nbs wei 942 +b2 343 +b3 Has tbs oe 
As there is only one factorization into primes of a given positive integer, by the 
Fundamental Theorem of Arithmetic, we can equate the exact power of prime p 
dividing each side of the last equation, to deduce that 
Np =A, +b, for each prime p. 
As dy, b, > 0 for each prime p, therefore 
O<a,y, bp <n» for each prime p. 

On the other hand if a = 2973%35% --- with each 0 < ap < np, then a divides n 
since we can construct the integer 

b = gnr2— 42 33 — 43 hN5 — a5 awd 
for which n = ab. We have therefore classified all of the possible (positive integer) 
divisors a of n. 

This classification allows us to easily count the number of divisors a of n, since 
this is equal to the number of possibilities for the exponents a); and we have that 
each a, is any integer in the range 0 < ay < np. There are, therefore, n, + 1 
possibilities for the exponent a,, for each prime p, making 

(ne + 1)(ng + 1)(ns + 1) fis 


possible divisors in total. Hence if we write 7(n) for the number of divisors of n, 
then 


p prime 
p'P||n 


and t(p*) = k +1 for all integers k > 0. A function whose value at n equals 
the product of the values of the function at the exact prime powers that divide 
n is called a multiplicative function (which will be explored in detail in the next 
chapter). 


As an example, we see that the divisors of 175 = 527! are given by 
5°79 = 1, 517° =5, 5779 = 25, 5°71 =7, 5171 = 35, 5777 = 175; 
in other words, they can all be factored as 
5°51, or 5° times 7° or 7}. 
Therefore the number of divisors is (2+ 1) x (1+1)=3x2=6. 


Use the Fundamental Theorem of Arithmetic in all of the remaining exercises 
in this section. 


°We suppress writing “prime” in the subscript of [], for convenience, at least when it should be 
obvious, from the context, that the parameter is only taking prime values. 


86 3. The basic algebra of number theory 


Exercise 3.3.1. Use the description of the divisors of a given integer to prove the following: If 
m =T[],p™” and n = J],,p"” are positive integers, then (a) gcd(m,n) = J], pmintmp. rp} and 
(b) lem[m,n] = [], pmaxtmm in}, 


The method in exercise B.3.1(a) for finding the gcd of two integers appears to 
be much simpler than the Euclidean algorithm. However, in order to make this 
method work, one needs to be able to factor the integers involved. We have not yet 
discussed techniques for factoring integers (though we will in chapter 10). Factoring 
is typically difficult for large integers. This difficulty limits when we can, in practice, 
use exercise [3.3.1] to determine gcds and lems. On the other hand, the Euclidean 
algorithm is very efficient for finding the gcd of two given integers (as discussed in 
appendix 1B) without needing to know anything about those numbers. 


Exercise 3.3.2. Deduce that mn = gcd(m,n) -lem[m, n] for all pairs of natural numbers m and 
n using exercise[3.3.1] (The proof in Corollary is more difficult.) 


In combination with the Euclidean algorithm, the result in exercise al- 
lows us to quickly and easily calculate the lcm of any two given integers. For 
example, to determine Icm[12,30], we first use the Euclidean algorithm to show 
that gcd(12, 30) = 6, and then Iem[12, 30] = 12 - 30/ gcd(12, 30) = 360/6 = 60. 


Although we have already proved the results in the next exercise (exercise 


[L.2.i/a), Lemma [4.1] exercise [I.2.5/a), and Corollary (1.2.2), we can now reprove 


them more easily by using our description of the divisors of a given integer. 


Exercise 3.3.3. (a) Prove that d divides gcd(a, b) if and only if d divides both a and b. 
(b) Prove that lcm[a, 6] divides m if and only if a and 6b both divide m. 
(c) Prove that if (a,b) = g, then (a/g, b/g) = 1. 
(d) Prove that if (a,m) = (b,m) = 1, then (ab,m) = 1. 
(e) Prove that if (a,b) = 1, then (ab,m) = (a,m)(b,m). 
(f)' Show that the hypothesis (a,b) = 1 is necessary in part (e), by constructing a counterex- 
ample to the conclusion when (a,b) > 1. 


One can obtain the gcd and lcm for any number of integers by means similar 


to exercise [3.3. 1} 


Example. If A = 504 = 2° - 32-7, B = 2880 = 2° - 3?-5, and C = 864 = 2° - 33, 
then the greatest common divisor is 23 - 3? = 72 and the least common multiple is 
26 . 33.5.7 = 60480. That is, if the powers of prime p that divide A, B, and C are 
Gp, bp, and Cp, respectively, then the powers of p that divide the gcd and lcm are 
min{d,, bp, cp} and max{ap, bp, cp}, respectively. 

Exercise 3.3.4. Prove that gcd(a, b,c) = gcd(a, gcd(b, c)) and Icm{a, b, c] = lem|[a, lem{b, c]]. 


Exercise 3.3.5. Prove that if each of a,b,c,... is coprime with m, then so is abc.... 


The representation of an integer in terms of its prime power factors can be 
useful when considering powers of integers: 


Exercise 3.3.6. Prove that if prime p divides a”, then p” divides a”. 


Exercise 3.3.7. (a) Prove that a positive integer A is the square of an integer if and only if 
the exponent of each prime factor of A is even. 
(b) Prove that if a,b,c,... are pairwise coprime, positive integers and their product is a square, 
then they are each a square. 


3.4, Irrationality 87 


(c) Prove that if ab is a square, then either a = gA? and b = gB?, or a = —gA? and b = —gB?, 
where g = gcd(a, b), for some coprime integers A and B. 
Exercise 3.3.8. (a) Prove that a positive integer A is the nth power of an integer if and only 
if n divides the exponent of all of the prime power factors of A. 
(b) Prove that if a,b,c,... are pairwise coprime, positive integers and their product is an nth 
power, then they are each an nth power. 


3.4. Irrationality 


One of the most beautiful applications of the Fundamental Theorem of Arithmetic 
is its use in showing that there are real irrational numbers|"] the easiest example 


being V2: 


Proposition 3.4.1. The real number \/2 is irrational. That is, there is no ratio- 
nal number a/b for which V2 = a/b. 


Proof. We will assume that V2 is rational and find a contradiction. If 2 is 
rational, then we can write 2 = a/b where a and b > 1 are coprime integers by 
exercise b). We have a = bV2 > 0. 


Now a = byV2 and so a? = 2b?. If we factor 
a= [[-” and b= [[»”. then Ie” =p? = oF = 2] [p, 
p e : ; 


where the a,’s and 6,’s are all integers. The exponent of the prime 2 in the fac- 
torization of a? = 2b? is 2a2 = 1+ 2b which is impossible (mod 2), giving a 
contradiction. Hence 2 cannot be rational. 


More generally we have, by a different proof, 


Proposition 3.4.2. If d is an integer for which Vd is rational, then Vd is an 
integer. Therefore if integer d is not the square of an integer, then Vd is irrational. 


Proof. We may write Vd = a/b where a and 6b are coprime positive integers, so 
that a? = db?. Now (a?,b?) = 1 and a? divides db?, which implies that a? divides 
d, by Euclid’s Lemma. But then d < db? = a? < d, implying that d = a?; that is, 
d is the square of an integer as claimed. 


Exercise 3.4.1. Give a proof of Proposition which is analogous to the proof of Proposition 
above. 


Exercise 3.4.2.1 Prove that 17!/° is irrational (using the ideas of the proof of PropositionB.4.i). 


The proof of Proposition |3.4.2] generalizes to give a nice application of Euclid’s 
Lemma to rational roots of arbitrary polynomials with integer coefficients: 


Theorem 3.4 (The rational root criterion). Suppose f(x) is a polynomial with 
integer coefficients, with leading coefficient aq and last coefficient ao. If f(m/n) = 0 
where m and n are coprime integers, then m divides ag and n divides ag. 


That is, real numbers that are not rational. 


88 3. The basic algebra of number theory 


Proof. Writing f(x) = ean a;z/ where each a; € Z we have 
agqm? + ag_im?—!n+---+aymn*! + apn? = n? f(m/n) =0. 


Reducing this equation mod n gives agm? = 0 (mod n) as every other term on the 
left-hand side is divisible by n. This can be restated as n divides agm?. By the 
hypothesis, we have (n,m) = 1 and so (n,m*“) = 1 by exercise [7.11] Therefore n 
divides agm* and (n,m) = 1, which implies that n divides ag by Euclid’s Lemma. 
We complete the proof by establishing 


Exercise 3.4.3. Prove that m divides ag by reducing the above equation mod m. 


Corollary 3.4.1. If a monic polynomial f(x) € Z[x] has a rational root, then that 
root must be an integer. 


Proof. We have ag = 1 as f is monic. Therefore n = +1 in the rational root 
criterion, which implies that m/n = +m, an integer. 


We can apply Corollary to the rational roots of the polynomial x” — d, 
and so we deduce that if d!/” is rational, then d!/" is an integer (and therefore if 
d‘/” is not an integer, then it is irrational), generalizing Proposition B.4.2] 


We have now proved that there exist infinitely many irrational numbers, the 
numbers Vd when d is not the square of an integer. This caused important philo- 
sophical conundrums for the early Greek mathematicians|® 


Exercise 3.4.4. Prove that the polynomial 2? — 32 — 1 is irreducible over Q. 


3.5. Dividing in congruences 


We are now ready to return to the topic of dividing both sides of a congruence 
through by a given divisor, resolving the conundrums raised in section 2.2] 


Lemma 3.5.1. If d divides both a and b and a = b (mod m), then 
a/d=b/d (mod m/g) where g=gcd(d,m). 


8 Ancient Greek mathematicians did not think of numbers as an abstract concept, but rather as 
units of measurement. That is, one starts with fixed length measures and determines what lengths can 
be measured by a combination of those original lengths: A stick of length a can be used to measure any 
length that is a positive integer multiple of a (by measuring out k copies of length a, one after another). 
Theorem[1.1]can be interpreted as stating that if one has measuring sticks of length a and b, then one 
can measure length gcd(a, b) by measuring out u copies of length a and then v copies of length b, to get 
total length au + bv = gcd(a,b). One can then measure out any multiple of gcd(a,b) by copying the 
above construction that many times. 

Pythagoras (* 570-495 B.C.) traveled to Egypt and perhaps India in his youth on his quest for 
understanding. In 530 B.C. he founded a mystical sect in Croton, a Greek colony in southern Italy, 
which developed influential philosophical theories. Pythagoreans believed that numbers must be con- 
structible in a finite number of steps from a finite given set of lengths and so erroneously concluded that 
no irrational number could be constructed in this way. However an isosceles right-angled triangle with 
two sides of length 1 has a hypotenuse of length V2, and so the Pythagoreans believed that V2 must be 
a rational number. When one of them proved Proposition it contradicted their whole philosophy 
and so was suppressed, “for the unspeakable should always be kept secret”! 

We looked at what types of lengths are “constructible” using only a compass and a straight edge 
in section of appendix OG. In fact, although the constructible lengths are quite restricted, they are, 
nonetheless, a far richer set of numbers than just the rational numbers. 

The Pythagoreans similarly associated the four regular polygons that were then known (the Pla- 
tonic solids after Plato) with the four “elements”—the tetrahedron with fire, the cube with earth, the 
octahedron with air, and the icosahedron with water—and so believed that there could be no others. 
They also suppressed their discovery of a fifth regular polygon, the dodecahedron. 


3.5. Dividing in congruences 89 


Proof. As d divides both a and b, we may write a = dA and b = dB for some 
integers A and B, so that dA = dB (mod m). Hence m divides d(A — B) and 
therefore “ divides (A — B). Now ged(®, *) = 1 by exercise [L2.5[a), and so @ 
divides A — B by Euclid’s Lemma. This is the result that was claimed. 


For example, 14 = 91 (mod 77). Now 14 = 7 x 2 and 91 = 7 x 13, and so 
we divide 7 out from 77 to obtain 2 = 13 (mod 11). More interestingly 12 = 42 
(mod 15), and 6 divides both 12 and 42. However 6 does not divide 15, so we cannot 
divide this out from 15, but rather we divide out by gcd(15,6) = 3 to obtain 2 = 7 
(mod 5). 


Corollary 3.5.1. Suppose that (a,m) = 1. 
(i) u=v (mod m) if and only if au = av (mod m). 
(ii) The residues 

(3.5.1) a.0, al, ..., a.(m—1) 


form a complete set of residues (mod m). 


Proof. (i) The third congruence of Lemma [2.1.J] implies that if uw = uv (mod m), 
then au = av (mod m). In the other direction, we take a,b,d in Lemma[3.5.1] to 
equal au, av, a, respectively. Then g = (a,m) = 1, and so au = av (mod m) 
implies that u =v (mod m) by LemmaB.5.1] 

(ii) By part (i) we know that the residues in are distinct mod m. Since 
there are m of them, they must form a complete set of residues (mod m). 


Corollary B.5.1{ii) states that the residues in (8.5.1) form a complete set of 
residues (mod m). In particular one of them is congruent to 1 (mod m); and so we 
deduce the following: 


Corollary 3.5.2. If (a,m) = 1, then there exists an integer r such that ar = 1 
(mod m). We call r the inverse of a (mod m). We denote this by 1/a (mod m), 
or a~* (mod m); some authors write @ (mod m). 


Third proof of Theorem [For any positive integers a, b, there exist integers 
u and v such that au + bv = gcd(a, b).] Let g = gcd(a, b) and write a = gA,b = gB 
so that (A, B) = 1. By Corollary B.5.2] there exists an integer r such that Ar = 1 
(mod B), and so there exists an integer s such that Ar—1 = Bs; that is, Ar—Bs = 
1. Therefore ar — bs = g(Ar — Bs) = g-1=g = gcd(a,b), as desired. 


This also goes in the other direction: 


Second proof of Corollary By Theorem [L.1] there exist integers u and v 
such that au + mv = 1, and so 


au =au+mv=1 (mod m). 


Therefore wu is the inverse of a (mod m). 


90 3. The basic algebra of number theory 


Exercise 3.5.1. Assume that (a,m) = 1. 
(a) Prove that if b is an integer, then a.0+b, a.1+b, ..., a(m—1)+6 form a complete set of 
residues (mod m). 
(b) Deduce that for all given integers b and c, there is a unique value of z (mod m) for which 
ax +b=c (mod m). 


If (a,m) = 1, then we can (unambiguously) express the root of az = c (mod m) 
as ca~' (mod m), or c/a (mod m); we take this to mean the residue class mod m 
which contains the unique value from exercise [3.5.I[b). For example 19/17 = 11 
(mod 12). Such quotients share all the properties described in Lemma [2.1.1] 


Exercise 3.5.2. Prove that if {ri,...,r%} is a reduced set of residues mod m and (a,m) = 1, 
then {ari,...,arxz} is also a reduced set of residues mod m 


Exercise 3.5.3. (a) Show that there exists r (mod b) for which ar = c (mod 6) if and only if 
gcd(a, b) divides c. 
(b)t Prove that the solutions r are precisely the elements of a residue class mod b/ gcd(a, b). 


Exercise 3.5.4. Prove that if (a,m) > 1, then there does not exist an integer r such that ar = 1 
(mod m). (And so Corollary could have been phrased as an “if and only if” condition.) 


Exercise 3.5.5. Explain how the Euclidean algorithm may be used to efficiently determine the 
inverse of a (mod m) whenever (a,m) = 1. (Calculating the inverse of a (mod m) is an essential 
part of the RSA algorithm discussed in section [10.3]) 


3.6. Linear equations in two unknowns 


Given integers a,b,c, can we determine all of the integer solutions m,n to 
am+bn=c? 
Example. To find all integer solutions to 4m + 6n = 10, we begin by noting that 


we can divide through by 2 to get 2m + 3n = 5. There is clearly a solution, 
2-1+3-1=5. Therefore 


2m+3n=5=2-1+3-1, 
so that 2(m— 1) = 3(1—1n). We therefore need to find all integer solutions u,v to 
2u = 3u 


and then the general solution to our original equation is given by m=1+u, n= 
1 — v, as we run over the possible pairs u,v. Now 2|3v and (2,3) = 1 so that 
2\v. Hence we may write v = 2¢ for some integer @ and then deduce that u = 32. 
Therefore all integer solutions to 4m + 6n = 10 take the form 

m=1+430, n=1-—- 20, for some integer @. 


We can imitate this procedure to establish a general result: 


Theorem 3.5. Let a,b,c be given integers. There are solutions in integers m,n 
toam-+bn = c if and only if (a,b) divides c. Given a first solution, say r,s (which 
can be found using the Euclidean algorithm), all integer solutions to am +bn =c 
are then given by the formula 


a 
m=r+———~l, n=s— —- for some integer €. 


(a,b) (a, b) 


3.6. Linear equations in two unknowns 91 


The full set of real solutions to ax + by = c is given by 
x=r+kb, y=s-—ka, where k is an arbitrary real number. 


By Theorem B.5] these are integer solutions exactly when k = €/(a,b) for some 
LEZ. 


In the discussion above we saw that it is best to “reduce” this to the case when 
(a,b) = 1. 


Corollary 3.6.1. Let a,b,c be given integers with (a,b) =1. Given a first solution 
in integers r,s to ar + bs =c, all integer solutions to am + bn = c are then given 
by the formula 


m=r+bl, n=s—atl for some integer £. 


Deduction of Theorem from Corollary If there is a solution in in- 
tegers m,n to am-+bn = c, then g := (a,b) divides a,b and am+ bn = c, so we can 
write a = Ag, b = Bg, c= Cg for some integers A, B,C with (A, B) = 1. We now 
determine the integer solutions to Am + Bn = C, where (A, B) = 1 by Corollary 
3.0.1 


Proof #1 of Corollary 8.6.1} If 
am+bn=c=ar+0s, 
then 
a(m—r) = b(s —n). 
We therefore need to find all integer solutions u,v to 
au = bv. 


In any given solution a divides v by Euclid’s Lemma as (a,b) = 1, and so we may 
write v = af for some integer & and deduce that wu = bé. We then deduce the 
claimed parametrization of integer solutions to am + bn = c. 


Exercise 3.6.1. Show that if there exists a solution in integers m,n to am+bn = c with (a, b) = 1, 
then there exists a solution with 0 < m < b. 


Proof #2 of Corollary There is an inverse to a (mod b), as (a,b) = 1; 
call it r. Let m be any integer = rc (mod b), so that am = arc = c (mod b), and 
therefore there exists an integer n for which am + bn =c. The result follows. 


Exercise 3.6.2. (a) Find all solutions in integers m,n to 7m+5n=1. 
(b) Find all solutions in integers u,v to 7v — 5u = 3. 
(c) Find all solutions in integers j,k to 37 — 9k = 1. 
(d) Find all solutions in integers r,s to 5r — 10s = 15. 


Exercise 3.6.3. Show that a linear equation am + bn = c where a, b, and c are given integers, 
cannot have exactly one solution in integers m,n. 


An equation involving a congruence is said to be solved when integer values 
can be found for the variables so that the congruence is satisfied. For example 
62 +5 = 13 (mod 11) has the unique solution z = 5 (mod 11), that is, all integers 
of the form 11k +5. 


92 3. The basic algebra of number theory 


There is another way to interpret Theorem[B.5) which will prove to be the best 
reformulation to understand what happens with quadratic equations: 
Exercise 3.6.4 (The local-global principle for linear equations). Let a,b,c be given non-zero 


integers. There are solutions in integers m,n to am + bn = c if and only if there exist residue 
classes u,v (mod b) such that au + bv = c (mod Bb). 


“Global” refers to looking over the infinite number of possibilities for integer solutions, “local” 
to looking through the finite number of possibilities mod b. This exercise will be revisited in 
exercise [3.9.13 


3.7. Congruences to several moduli 


What are the integers that satisfy given congruences to two different moduli? 


Lemma 3.7.1. Suppose that a, A,b, B are integers. There exists an integer x such 
that both x = a (mod A) andx =b (mod B) if and only if b = a (mod gced(A, B)). 
If there is such an integer x, then the two congruences hold simultaneously for all 
integers x belonging to a unique residue class (mod lcm|[A, B}). 


Proof. The integers x for which « = a (mod A) may be written in the form 
x = Ay+a for some integer y. We are therefore seeking solutions to Ay+a = 2 =b 
(mod B), which is the same as Ay = b—a (mod B). By exercise [B.5.3fa), this has 
solutions if and only gcd(A, B) divides b — a. Moreover exercise [3.5.3(b) implies 
that y is a solution if and only if it is of the form u+n-B/(A, B) for some initial 
solution u and any integer n. Therefore 7 must be of the form 


x= Ayta=A(u+n:-B/(A,B))+a=v+n-lemA, BI, 
where v = Au+a and since A: B/(A, B) = [A, B] by Corollary 


The generalization of this last result is most elegant when we restrict to moduli 
that are pairwise coprime. We prepare with the following exercises: 


Exercise 3.7.1. Determine all integers n for which n = 101 (mod 7!') and n= 101 (mod 131”), 
in terms of one congruence. 


Exercise 3.7.2. Suppose that a,b,c,... are pairwise coprime integers. 
(a) Prove that if a,b,c,... each divide m, then abc... divides m. 
(b) Deduce that if m =n (mod a) and m =n (mod b) and m =n (mod c), ..., then m=n 
(mod abc...). 


Theorem 3.6 (The Chinese Remainder Theorem). Suppose that mj,...,™Mk 
are a set of pairwise coprime positive integers. For any set of residue classes 


a, (mod m,), a2 (mod mg),...,a,% (mod mx), 
there exists a unique residue class x (mod m) where m = mym2...mx, for which 


x=a; (modm,;) for each j. 


Proof. We can map x (mod m) to the vector (2 (mod mj), x (mod mg),...,x 
(mod m,)). There are mymz...mg different such vectors and each different « mod 
m maps to a different one, for if = y (mod m,) for each j, then z = y (mod m) by 
exercise [3.7.2(b). Hence there is a suitable 1-to-1 correspondence between residue 
classes mod m and vectors, which implies the result. 


3.7. Congruences to several moduli 93 


This is known as the Chinese Remainder Theorem because of the ancient Chinese practice 
(as discussed in Sun Tzu’s 4th-century Classic Calculations) of counting the number of 
soldiers in a platoon by having them line up in three columns and seeing how many are 
left over, then in five columns and seeing how many are left over, and finally in seven 
columns and seeing how many are left over, etc. For instance, if there are a hundred 
soldiers, then there should be 1, 0, and 2 soldiers left over, respectively!)] and the next 
smallest number of soldiers one would need for this to be true is 205 (since 205 is the 
next smallest positive integer = 100 (mod 105)). Presumably an experienced commander 
can eyeball the difference between 100 soldiers and 205! Primary school children in China 
learn a song that celebrates this contribution. 


We can make the Chinese Remainder Theorem a practical tool by giving a 
formula to determine x, given a1, d2,...,a%: Since (m/m,;,m,;) = 1 there exists an 
integer b; such that b; - > = 1 (mod m,;) for each j, by Corollary Then 

J 


(3.7.1) i Sigh eo eigha a Tenney Dg (mod m). 
My mg Mk 


This works because m; divides m/m, for each i # j and so 


r=O0+:--+0+a4,;-b; tO+---+0=a;-l=a; (mod m,) 

J 

for each j. The 6; can all be determined using the Euclidean algorithm, so x can 
be determined rapidly in practice. 


Exercise 3.7.3.1 Use this method to give a general formula for x (mod 1001) when « = a 
(mod 7),  =b (mod 11), and x =c (mod 13). 


Exercise 3.7.4.1 Find the smallest positive integer n which can be written as n = 2a? = 3b3 = 
5c° for some integers a, b, c. 


There is more discussion of the Chinese Remainder Theorem in section B.14] of 
appendix 3B, in particular in the more difficult case in which the m,’s have common 
factors: 


Exercise 3.7.5.1 Given residue classes a, (mod mj),...,a% (mod mx) let m = lem[my,..., mx]. 
Prove that there exists a residue class b (mod m) for which b = a; (mod m;) for each j if and 
only if aj =a; (mod (mj;,mj;)) for all i ¢ j. 


Moreover in appendix 3C we explain how the Chinese Remainder Theorem can 
be extended to, and understood in, the more general and natural context of group 
theory. 


Exercise 3.7.6. (a) Prove that each of a, b,c,... divides m if and only if lem[a, b, c,...] divides 
m. 
(b) Deduce that ifm =n (mod a) andm =n (mod b) and...,thenm =n (mod Icm{a,b,...]). 
(c) Prove that if b (mod m) in exercise exists, then it is unique. 


Exercise 3.7.7.1 Let M,N,g be positive integers with (M,N,g) = 1. Prove that the set of 
residues {aN +bM (mod g):0<a,b< g—1} is precisely g copies of the complete set of residues 
mod g. 


°Since 100 = 1 (mod 3),=0 (mod 5), and = 2 (mod 7). 


94 3. The basic algebra of number theory 


Exercise 3.7.8. (a) Prove that for any odd integer m there are infinitely many integers n for 

which (n,m) = (n+ 1,m) =1. 

(b) Why is this false if m is even? 

(c) Prove that for any integer m there are infinitely many integers n for which (n,m) = 
(n+ 2,m) =1. 

(d)* Let ay < ag <--- < ay be given integers. Give an “if and only if” criterion in terms of the 
a; (mod p), for each prime p dividing m, to determine whether there are infinitely many 
integers n for which (n + a1,m) = (n+ a2,m) =--- = (n+ag,m) = 1. 


Exercise 3.7.9. Prove that there exist one million consecutive integers, each of which is divisible 
by the cube of an integer > 1. 


3.8. Square roots of 1 (mod n) 
We begin by noting 


Lemma 3.8.1. Jf p is an odd prime, then there are exactly two square roots of 1 
(mod p), namely 1 and —1. 


Proof. If x? = 1 (mod p), then p|(x? — 1) = (x —1)(x+ 1) and so p divides either 
x —lora+1 by TheoremB.1] Hence x = 1, or —1 (mod p). 


There can be more than two square roots of 1 if the modulus is composite. 
For example, 1, 3, 5, and 7 are all roots of x? = 1 (mod 8), while 1,4,—4, and -1 
are all roots of x? = 1 (mod 15), and +1,+29,+34,+41 are all square roots of 1 
(mod 105). How can we find all of these solutions? 


By the Chinese Remainder Theorem, x is a root of 2? = 1 (mod 15) if and 
only if 2? =1 (mod 3) and 2? =1 (mod 5). But, by Lemma[B.8.1] this happens if 
and only if z= 1 or —1 (mod 3) and«=1or —1 (mod 5). There are therefore 
four possibilities for « (mod 15), given by making the choices 


x=1 (mod3) and w=1 (mod5), whichimply x=1 (mod 15); 
x=-1 (mod3) and x=-1 (mod5), which imply x=-—1 (mod 15); 
x=1 (mod3) and x=-1 (mod5), whichimply =4 (mod 15); 
x=-1 (mod3) and x=1 (mod5),  whichimply x=—4 (mod 15), 


the last two giving the less obvious solutions. This proof generalizes in a straight- 
forward way: 


Proposition 3.8.1. [fm is an odd integer with k distinct prime factors, then there 
are exactly 2" solutions x (mod m) to the congruence x? =1 (mod m). 


Proof. Lemma [3.8.1] proves the result for m prime. What if m = p® is a power 
of an odd prime p? If 2? = 1 (mod p°), then p|(z? — 1) = (x — 1)(a + 1) and so 
p divides either x — 1 or x + 1 by Theorem However p cannot divide both, 
or else p divides their difference, which is 2. Now suppose that p does not divide 
x+1. Since p*|(z? — 1) = (x —1)(x +1) we deduce that p*|(z — 1) by Euclid’s 
Lemma. Similarly, if p does not divide « — 1, then p*|(a +1). Therefore « = —1 or 
1 (mod p*). 
Now, suppose that a is an integer for which 


a*=1 (mod m), 


3.8. Square roots of 1 (mod n) 95 


where m = pj! ...p;" where the p,; are distinct odd primes and the e; > 1. By the 
Chinese Remainder Theorem, this is equivalent to a satisfying 


a = (mod p;’) for 7 =1,2,...,k. 
By the first paragraph, this is, in turn, equivalent to 
a=lor-1 (mod p;’) for 7 =1,2,...,k. 


By the Chinese Remainder Theorem, each choice of a (mod p{'),..., a@ (mod p;") 
gives rise to a different value of a (mod m) that will satisfy the congruence a? = 1 
(mod m). Therefore there are exactly 2* distinct solutions. 


Proposition [3.8.1] is, in effect, an algorithm for finding all of the square roots 
of 1 (mod m), provided one knows the factorization of m. Conversely, in section 
10.1} we will see that if we are able to find square roots mod m, then we are able 
to factor m. 


Exercise 3.8.1. Prove that if (2,6) = 1, then 2? = 1 (mod 24) without working mod 24. You 
are allowed to work mod 8 and mod 3. 


Exercise 3.8.2. (a) What are the roots of z? = 1 (mod 2°) for each integer e > 1? (This 
must be different from the odd prime case since vel (mod 8) has four solutions, 1, 3,5, 7 


(mod 8).) 
(b)* Prove that if m has k distinct prime factors, there are exactly 2*+% solutions x (mod m) 
to the congruence xz? = 1 (mod m), where, if 2¢||m, then 6 = 0 if e = 0 or 2, 6 = —1 if 


e=l,andéd=1life>3. 
(c) Deduce that the product of the square roots of 1 (mod 2°) equals 1 (mod 2°) if e > 3. 


Exercise 3.8.3.1 Prove that the product of the square roots of 1 (mod m) equals 1 (mod m), 
unless m = 4 or m = p® or m = 2p® for some power p® of an odd prime p, in which case it equals 
—1 (mod m). 


In Gauss’s 1801 book he gives an explicit practical example of the Chinese Remainder 
Theorem. Before pocket watches and cheap printing, people were more aware of solar 
cycles and the moon’s phases than what year it actually was. Moreover, from Roman times 
to Gauss’s childhood, taxes were hard to collect since travel was difficult and expensive 
and so were not paid annually but rather on a multiyear cycle. Gauss explained how to 
use the Chinese Remainder Theorem to deduce the year in the Julian calendar from these 
three pieces of information: 


e The indiction was used from 312 to 1806 to specify the position of the year in a 
15-year taxation cycle. The indiction is = year + 3 (mod 15). 


e The moon’s phases and the days of the year repeat themselves every 19 years) 
The golden number, which is = year + 1 (mod 19), indicates where one is in that cycle of 
19 years (and is still used to calculate the correct date for Easter). 


e The days of the week and the dates of the year repeat in cycles of 28 years in the 
Julian calender[] The solar cycle, which is = year + 9 (mod 28), indicates where one is 
in this cycle of 28 years. 


1°Meton of Athens, in the 5th century BC, observed that 19 (solar) years is less than two hours 
out from being a whole number of lunar months. 
11Since there are seven days in a week and leap years occur every four years. 


96 3. The basic algebra of number theory 


Taking m1 = 15, m2 = 19, m3 = 28, we observe that 


ee Ce 1 - m = 
by) = 19-28 = 4-(-2) = 2 (mod 15) and 6; a 2-19-28 = —1064, 
1 1 1 m 
by = ~ =-=10 d19) and by-—” = 10-15-28 = 4200 
2= 75-28 (-4)-9 2 CGS TO OB 
1 1 1 1 m 
b3 = = = = t=-lil 2 d b3-— = —3135. 
3= 75-19 (144+1)-19  144+19~ 5 Oe a) SNe ee eee 


Therefore if the indiction is a, the golden number is b, and the solar cycle is c, then the 
year is 


= —1064a + 4200b — 3135c (mod 7980). 


Additional exercises 


Exercise 3.9.1. Prove that if 2” — 1 is prime, then n must be prime. 


Exercise 3.9.2. Suppose that 0 < zo < x1 <--- is a division sequence (that is, tm|an whenever 
m|n; see exercise [1.7.22), with rn41 > % whenever n > no (> 1). Prove that if x, is prime for 
some integer n > n2, then n is prime. 


We can apply exercise[3.9.2]to the Mersenne numbers M,, = 2”—1, with no = 1, 
so that if M,, is prime, then n is prime; and to the Fibonacci numbers with no = 2, 
so that if F, is prime, then n is prime or n = 4. 


Exercise 3.9.3. We introduced the companion sequence (yn)n>0 of the Lucas sequence (%n)n>0 
in exercise[0.1.4] Note that y; = a does not necessarily divide yz = a? + 2b. 
(a)? Prove that ym divides yn whenever m divides n and n/m is odd. 
(b) Assume that a > 1 and b > 0. Deduce that if yn is prime, then n must be a power of 2. 
(c) Deduce that if 2” + 1 is prime, then it must be a Fermat number. 


Exercise 3.9.4.) Prove that the Fundamental Theorem of Arithmetic implies that for any finite 
set of primes P, the numbers log p, p € P, are linearly independent] over Q. 


Exercise 3.9.5.1 Prove that gcd(a, b,c) - lcm[a, b,c] = abc if and only if a, b, and ¢ are pairwise 
coprime. 


Exercise 3.9.6.1 Prove that if a and b are positive integers whose product is a square and whose 
difference is a prime p, then a + b = (p? + 1)/2. 


Exercise 3.9.7. Let p be an odd prime and a, y, and z pairwise coprime, positive integers. 
(a)t Prove that vce = py? (mod z — y). 
(b) Deduce that ged( ==!" 


,2—-y)=lorp. 


(This problem is continued in exercise[7.10.6]) 


=y 
z—-y 


Exercise 3.9.8. Suppose that f(x) € Z[z] is monic and f(0) = 1. Prove that if r € Q and 
f(r) =0, then r = 1 or -1. 


Exercise 3.9.9 (Another proof that V2 is irrational). Suppose that V2 = a/b where a and b are 
coprime integers, so that a? = 2b?. 

(a) Prove that 3 cannot divide b, and so let c= a/b (mod 3). 

(b) Prove that c? = 2 (mod 3), and therefore obtain a contradiction. 


1251, ...;2 are linearly dependent over Q if there exist rational numbers a1,...,a%, which are 


not all zero, such that a1%1 +---+a,gx%_% = 0. They are linearly independent over Q if they are not 
linearly dependent over Q. 


The many proofs that V2 is irrational 97 


Exercise 3.9.10. (a) Prove that J/2+ V3 is irrational. 
(b) Prove that /a+ V2 is irrational unless a and b are both squares of integers. 


Exercise 3.9.11. Suppose that d is an integer and Vd is rational. 
(a) Show that there exists an integer m such that Vd—m = p/q where 0 < p < q and (p,q) = 1. 
(b) If p 40, show that /d+m = Q/p for some integer Q. 
(c) Use (a) and (b) to establish a contradiction when p # 0. 
(d) Deduce that d= m?. 


Reference on the many proofs that \/2 is irrational 


[1] John H. Conway and Joseph Shipman, Extreme proofs I: The irrationality of /2, Math. Intelligencer 
35 (2013), 2-7. 


We say that N can be represented by the linear form ax + by, if there exist 
integers m and n such that am+bn = N. The representation is proper if (m,n) = 1. 


Exercise 3.9.12.7 In this question we prove that if N can be represented by ax + by, then it 
can be represented properly. Let A = a/(a,b) and B = b/(a,b). Theorem states that if 
N =ar + bs, then all solutions to am + bn = N take the form m = r+kB,n =s—kA for some 
integer k. 

(a) Prove that gcd(m,n) divides N. 

(b) Prove that at least one of A and B is not divisible by p, for each prime p. 

(c) Prove that if p { A, then there exists a residue class ky (mod p) such that p|s — kA if and 
only if k = kp (mod p). Therefore deduce that p{s—kA if k = kp +1 (mod p). Note an 
analogous result if p|A (in which case p{ B). 

(d) Deduce that there exists an integer k such that, for all primes p dividing N, either p does 
not divide r+ kB or p does not divide s — kA (or both). 

(e) Deduce that ifm =r+kB and n=s-—kA, then N is properly represented by am + bn. 


Exercise 3.9.13. Prove the following version of the local-global principle for linear equations 
(exercise 3.6.4): Let a,b,c be given integers. There are solutions in integers m,n to am+bn =c 
if and only if for all prime powers p© (where p is prime and e is an integer > 1) there exist residue 
classes u,v (mod p®) for which au + bv = c (mod p*). 


Exercise 3.9.14. Find all solutions to 5a + 7b = 211 where a and b are positive integers. 


Exercise 3.9.15. Suppose that f(x) € Z[x] and m and n are coprime integers. 
(a) Prove that there exist integers a and b for which f(a) = 0 (mod m) and f(b) =0 (mod n) 
if and only if there exists an integer c for which f(c) = 0 (mod mn), and show that we may 
take c= a (mod m) and c= b (mod n). 
(b) Suppose that pi < po <-+-- < px are primes. Prove that there exist integers a1,...,a@, such 
that f(a;) = 0 (mod p;) for 1 < i < k if and only if there exists an integer a such that 
f(a) = 0 (mod pip2... px). 


Adding reduced fractions. A reduced fraction takes the form a/b where a and 
b > 0 are coprime integers. We wish to better understand adding reduced fractions. 


Exercise 3.9.16.' Suppose that m and n are coprime integers. 
(a) Prove that for any integer c there exist integers a and 6 for which mo 


(b) Prove that there are (unique) positive integers a and b for which 2- = & 
mn m 


98 3. The basic algebra of number theory 


Exercise 3.9.17. Let m and n be given positive integers. 

(a) Prove that for any integers a and b there exists an integer c for which & + 2 = + where 

L =I|cm[m, n]. 
For the denominators 3 and 6, with L = 6, we have the example | + é = 5 a case in which 
the sum has a denominator smaller than L when written as a reduced fraction. However 
3 + 2 = é so there are certainly examples with these denominators for which the sum has 
denominator L. 

(b)* Show that lem[m,n] is the smallest positive integer L such that for all integers a and b we 
can write *% + © as a fraction with denominator L. (This is why lcm[m,n] is sometimes 
called the lowest (or least) common denominator of the fractions 1/m and 1/n.) 

(c)t Show that if * and 2 are reduced fractions whose sum has denominator less than L, then 


there must exist a prime power p® such that p°||m and p®||n for which p°t! divides an+bm. 


Appendix 3A. Factoring 
binomial coefficients 
and Pascal's triangle 
modulo p 


3.10. The prime powers dividing a given binomial coefficient 


Lemma 3.10.1. The power of prime p that divides n! is )>,5,[n/p*]. In other 
words 7 


p prime 


Proof. We wish to determine the power of p dividing n! = 1-2-3---(n—1)-n. If 
p* is the power of p dividing m, then we will count 1 for p dividing m, then 1 for p? 
dividing m,..., and finally 1 for p* dividing m. Therefore the power of p dividing 
n! equals the number of integers m, 1 < m < n, that are divisible by p, plus the 
number of integers m, 1 < m <n, that are divisible by p?, plus ..... The result 
follows as there are [n/p] integers m, 1 < m <n, that are divisible by p’ for each 
j > 1, by exercise [L.7.6{c). 


Exercise 3.10.1. Write n = no + nip+---+ nap? in base p so that each n; € {0,1,...,p— 1}. 
(a) Prove that [n/p*] = (nm — (no + mip +--+ + mx_ap -1)) /p*. 
The sum of the digits of n in base p is defined to be sp(n) := no +1 +--+: +14. 
(b) Prove that the exact power of prime p that divides n! is woe 
Theorem 3.7 (Kummer’s Theorem). The largest power of prime p that divides 
the binomial coefficient Cy) is given by the number of carries when adding a and 
b in base p. 


100 Appendix 3A. Factoring binomial coefficients and Pascal’s triangle modulo p 


Example. To recover the factorization of (ey) we add 6 and 8 in each prime base 
< 14: 


0101 020 11 06 06 06 
1000; 0225 WS— ie USiy 08 
1101 112 24 20 13 ll 


We see that there are no carries in base 2, 1 carry in base 38, no carries in base 5, 
1 carry in base 7, 1 carry in base 11, and 1 carry in base 13, so we deduce that 
14) "91.71 1 1 

Co Saer sles 


Proof. For given integer k > 1, let q = p*. Then let A and B be the least non- 
negative residue of a and b (mod q), respectively, so that 0 < A,B <q —1. Note 
that A and B give the first k digits (from the right) of a and b in base p. If C is 
the first & digits of a+ in base p, then C is the least non-negative residue of a+b 
(mod q), that is, of A+ B (mod q). Now0< A+B < 2¢: 

elf A+B <q, then C = A+B and there is no carry in the kth digit when 
we add a and 0 in base p. 


elfA+B>4q, then C = A+ B—qand so there is a carry of 1 in the kth digit 
when we add a and b in base p. 


We need to relate these observations to the formula in Lemma le The 
k |b 
dD | oF 
P 


trick comes in noticing that A = a — p* ze , and similarly B = b — and 


C=a+b—p* [=| . Therefore 


a+b a b A+B—C _ J1 ifthere is a carry in the kth digit, 
pk pk pF 0 if not, 


5 (s')-5}-() 


equals the number of carries when adding a and 6 in base p. However LemmaJ3.10. 1] 
implies that this also equals the exact power of p dividing (oth)! _ Co), and the 


alb! 
result follows. 


and so 


Exercise 3.10.2. State, with proof, the analogy to Kummer’s Theorem for trinomial coefficients 
n!/(alb!c!) wherea+b+c=n. 


Corollary 3.10.1. If p° divides the binomial coefficient (2s then p® <n. 
Proof. There are k + 1 digits in the base p expansion of n when p* <n < p*t!, 
When adding m and n — m there can be carries in every digit except the (k + 1)st 
(which corresponds to the number of multiples of p*). Therefore there are no more 
than k carries when adding m to n— in base p, so that p® < p* < n by Kummer’s 
Theorem. 


Exercise 3.10.3. Prove that if0 << k <n, then (9) divides lem[m : m < n]. 


3.11. Pascal’s triangle modulo 2 101 


3.11. Pascal’s triangle modulo 2 


In section[0.3] we explained the theory and practice of constructing Pascal’s triangle. 
We are now interested in constructing Pascal’s triangle modulo 2, mod 3, mod 4, etc. 
To do so one can either reduce the binomial coefficients mod m (for m = 2,3,4,...) 
or one can rework Pascal’s triangle, starting with a 1 in the top row and then 
obtaining a row from the previous one by adding the two entries immediately above 
the given entry, modulo m. For example, Pascal’s triangle mod 2 starts with the 
rows 


It is perhaps easiest to visualize this by replacing 1 (mod 2) by a dark square and, 
otherwise, a white square, as in the following fascinating diagram’? 


¥. 
(mod 2) 


A 
Wer" 


One can see patterns emerging. For example the rows corresponding to n = 
1,3,7,15,... are all 1’s, and the next rows, n = 2,4,8,16,..., start and end with a 
1 and have all 0’s in between. Even more: The two 1’s at either end of row n = 4 
seem to each be the first entry of a (four-line) triangle, which is an exact copy of 
the first four rows of Pascal’s triangle mod 2, similarly the two 1’s at either end of 
row n = 8 and the eight-line triangles beneath (and including) them. In general 
if Tj, denotes the top 2* rows of Pascal’s triangle mod 2, then Ty41 is given by a 
triangle of copies of Ty, with an inverted triangle of zeros in the middle, as in the 


13This and other images in this section reproduced with kind permission of Bill Cherowitzo. 


102 Appendix 3A. Factoring binomial coefficients and Pascal’s triangle modulo p 


following diagram: 


Figure 3.1. The top 2*+! rows of Pascal’s triangle mod 2, in terms of the 
top 2* rows. 


This is called self-similarity. One immediate consequence is that one can determine 
the number of 1’s in a given row: If 2* < n < 2*+!, then row n consists of two 
copies of row m (:= n — 2") with some 0’s in between. 


Exercise 3.11.1. Deduce that there are 2” odd entries in the nth row of Pascal’s triangle, where 
k = s(n), the number of 1’s in the binary expansion of n. 


This self-similarity generalizes nicely for other primes p, where we again replace 
integers divisible by p by a white square, and those not divisible by p by a black 
square. 


Ad 


KD AR 


Pascal’s triangle Pascal’s triangle Pascal’s triangle 
(mod 3) (mod 5) (mod 7) 


The top p rows are all black since the entries (") with 0 <<m<n< p—1 are never 
divisible by p. Let Tj, denote the top p* rows of Pascal’s triangle. Then 7j,+1 is 
given by an array of p rows of triangles, in which the nth row contains n copies of 


Ty, with inverted triangles of 0’s in between. 


Pascal’s triangle modulo primes p is a bit more complicated; we wish to color 
in the black squares with one of p — 1 colors, each representing a different reduced 
residue class mod p. Call the top row the Oth row, and the leftmost entry of each 
row its Oth entry. Therefore the mth entry of the nth row is (ee By Lucas’s 


Theorem (exercise 2.5.10) the value of (e itt) (mod p), which is the bth entry of 


the sth row of the copy of 7; which is the ath entry of the rth row of the copies 
of Tj, that make up T,+1, is = (")(;) (mod p). In other words, the values in the 
copy of T, which is the ath entry of the rth row of the copies of T;, are (") times 


the values in Ty. 


3.11. Pascal’s triangle modulo 2 103 


The odd entries in Pascal’s triangle mod 4 make even more interesting patterns, 
but this will take us too far afield; see |1| for a detailed discussion. 


Reading each row of Pascal’s triangle mod 2 as the binary expansion of an 
integer, we obtain the numbers 


1, 1lg =3, 1012 =5, 11112 = 15, 10001, =17, 110011, = 51, 1010101) = 85,.... 
Do you recognize these numbers? If you factor them, you obtain 
1, fo, Fi, Fofi fo, Poh, 1, FoF fa... 


where Fy, = 2?” +1 are the Fermat numbers (introduced in exercise (0.4.14). It 
appears that all are products of Fermat numbers, and one can even guess at which 
Fermat numbers. For example the 6th row is F2F, and 6 = 2?+2! in base 2, whereas 
the 7th row is F)F,Fo and 7 = 2? +2!+2° in base 2, and our other examples follow 
this same pattern. This leads to the following challenging problem: 

Exercise 3.11.2. Show that the nth row of Pascal’s triangle mod 2, considered as a binary 
number, is given by 4 Fn;; where n = 270 + 271 +.--4 2", withO < no < ni <-:: << neE 
(i.e., the binary expansion of n)E4 


References for this chapter 


[1] Andrew Granville, Zaphod Beeblebrox’s brain and the fifty-ninth row of Pascal’s triangle, Amer. 
Math. Monthly 99 (1992), 318-331. 


[2] Kathleen M. Shannon and Michael J. Bardzell, Patterns in Pascal’s Triangle - with a Twist - First 
Twist: What is It?, Convergence (December 2004). 


14An m-sided regular polygon with m odd is constructible with ruler and compass (see section 
0.18]of appendix OG) if and only if m is the product of distinct Fermat primes. Therefore the integers 
m created here include all of the odd m-sided, constructible, regular polygons. 


Appendix 3B. Solving linear 
congruences 


Gauss’s approach to composite moduli in the Chinese Remainder Theorem uses 
methods that are different from those used today, but which are no less effective: 


3.12. Composite moduli 


If the modulus m is composite, then we can solve any linear congruence question, 
“one prime at a time”, as in the following example: To solve 


19x =1 (mod 140) 


we first do so (mod 2), as 2 divides 140, to get x = 1 (mod 2). Substituting 
x =1-+ 2y into the original equation we get 


38y =—18 (mod 140) or, equivalently, 19y=—9 (mod 70). 


Since 2 divides 70 we again view this (mod 2) to get y = 1 (mod 2). Substituting 
y = 1+ 2z into this equation we get 


38z = —28 (mod 70) andthus 19z=—14 (mod 35). 
Viewing this (mod 5) gives —z = 1 (mod 5), and so substitute z = —1+ 5w to get 
95w =5 (mod 35) sothat 19w=1 (mod 7). 
Therefore 5w = 1 (mod 7) and so w = 3 (mod 7) which implies, successively, that 


z=-14+5-3=14 (mod 35), y=1+2-14=29 (mod 70), 
and #=1+4+2-29=59 (mod 140). 


104 


3.13. Solving linear congruences with several unknowns 105 


3.13. Solving linear congruences with several unknowns 


We restrict our attention to when there are as many congruences as there are 
unknowns, so that we aim to find all integer (vector) solutions x (mod m) to Ar = b 
(mod m), where A is a given n-by-n matrix of integers and b is a given vector of n 
integers. 


Let a; be the ith column vector of A, and let 
V; ={v eR": v-a;=0 for alli Aj} 


be the set of vectors in R” that are orthogonal to all the a; other than a;. Basic 
linear algebra gives us that V; is itself a vector space of dimension > n—(n—1)=1 
and has a basis over Q made up of vectors with only integer entries (since we may 
multiply through any Q-vector by some integer to make it into a Z-vector). Hence 
we may take a non-zero vector in V; with integer entries and divide through by 
the gcd of those entries to obtain a vector c; whose entries are coprime. Therefore 
cj: a; = 0 for alli A j. Let dj = cj-a; € Z. Let C be the matrix with ith row 
vector c;, and let D be the diagonal matrix with (j,7)th entry d;. Then 


Dz = (CA)xz = C(Ax) = Cb (mod m). 


Let y = Cb (mod m), so that if Ax = b (mod m), then Dx = y (mod m). This 
has solutions if and only if there exists a solution x; to djz; = yj; (mod m) for each 
j. In exercise [3.5.3] we saw that there are solutions to this last congruence if and 
only if (dj;,m) divides y; for each j, and we determined how to find all solutions. 


If you have studied linear algebra, then your first impulse is to define A = 
| det A], where det A is the determinant of A. If A 4 0, then we can solve the 
system of equations Ay = b over the rationals by taking C = A-A7!, so that 
Ay = Cb (in which case, we have replaced each d; by A). However, this might not 
be useful in trying to solve equations mod m for, if, say, m divides A, then our 
equation becomes 0 = Cb (mod m). With Gauss’s construction, the d; all divide 
A but are not necessarily equal to it, which gives Gauss’s method more flexibility. 
We now give two examples; in the first the |d;| all equal the determinant, but in 
the second they are smaller. 


1 3 1 L1 1 
Example. We wish to solve |4 1 5 x2 | = |7] (mod 8). The matrix has 
2 2 1 x3 3 
determinant 15. We obtain, proceeding much like we would over the integers, 
15 0 O -9 -1 14 1 3 1 
0 15 OJa={6 -1 -1 4 1 5])a@ 
0 O 15 6 4 -ll 22 1 
-9 -1 14 1 26 
={6 -1 -1l 7) = |-4 (mod 8), 
6 4 -ll 3 1 
—2 
sothatv= | 4 (mod 8). This gives all solutions mod 8. 


—1 


106 Appendix 3B. Solving linear congruences 


3.5 1 Uy 4 
Example. We wish to solve |2 3 2] | a2] = 1{|7J] (mod 12). The matrix 
5 1 3 x3 6 
has determinant 28. We obtain, using Gauss’s technique, 
4 0 0 Ie 2 I 3.5 1 
0 7 OO} a= 1 1 <1 2 3 2)a 
0 0 28 -13 22 -1 5 1 3 
1 —2 1 4 —4 
=/{ 1 1 -l 7) =] 5 (mod 12), 
-13 22 -1 6 96 
and so, if we have a solution (21,%2,%3), then #1 = —1 (mod 3), a2 = —-1l 


(mod 12), and x3 = 0 (mod 3). To obtain all solutions mod 12 we substitute 
3 1 

ty =2+43t, r2 = —1, x3 = 3u into the original equations, so that | 2 2 (‘) = 
5 3 


1 
2 (mod 4) which is equivalent to t= u—1 (mod 4). Therefore all solutions 
—1 
are given by 


x3=0 (mod 3) with #,;=23-1 and 2z2=-—1 (mod 12). 
3.14. The Chinese Remainder Theorem in general 
When the moduli are not coprime 


We proved the Chinese Remainder Theorem for any two moduli in Lemma [3.7.1] 
but restricted to arbitrarily many pairwise coprime moduli in Theorem B.6] We 
now give the full, more complicated, statement for arbitrary moduli: 


Theorem 3.8 (The Chinese Remainder Theorem, revisited). Suppose that we are 
given positive integers mj ,™m2,...,Mx and any residue classes 


(3.14.1) a, (mod m,), a2 (mod mg),...,a~ (mod mx). 
There exists a unique residue class x mod m = lem[m,,Mo2,..., Mx] for which 


x =a; mod m, for each j if and only if a; = a; mod (m:,m,;) for alli F j. 


Proof. If there is a solution, then a; = =a; (mod (m;,m,)) for all i 4 j, since 
(m;,m,;) divides both m; and m,. 

Now assume that a; = a; (mod (m;,m,;)) for alli # 7. We will prove the 
result by induction on k > 1. For k = 1 there is nothing to prove so now assume 
that & > 2. By the induction hypothesis there exists a unique residue class ag mod 
mo := lem[mg,..., mx], for which ag = a; (mod m;) for each j > 2. 

Now a, = a; = ao (mod (m,,m,;)) and so (m1,m,;) divides a, — ao, for each 
j > 2. Therefore lem[(mi1,m,;) : 2 < j < kJ, which equals (m1,mo) (by exercise 
(3.14.1) divides a, — ag; that is, ag = a, (mod (mo,m1)). By Lemma B.7.1] there 


When the moduli are not coprime 107 


exists a unique residue class x mod lem|mo, m1] = m, for which « = a; (mod m1) 
and x = ao (mod m) which is = a; (mod m,) for each j > 2. 

Moreover, exercise [3.7.6] also implies that if there is a residue class 2 (mod m) 
which belongs to all of the residue classes in (3.14.1), then it is unique. 


Exercise 3.14.1. Prove that lem[gcd(m,nj;): 1<j<k] = gced(m,lem[n;: 1<j < kj). 


Finding the solution modulo m. The Chinese Remainder Theorem shows that 
there is a solution x (mod m) but not how to find it efficiently. We discuss algo- 
rithms to do so in this subsection. 


Example. Can one find integers z for which z = —4 (mod 35), z = 17 (mod 504), 
and z = 1 (mod 16)? The first two congruences combine to give z = 521 
(mod 2520). Combining this with the congruence z = 1 (mod 16), we get z = 3041 
(mod 5040). 

Given a, (mod m1),...,a@% (mod m,) for which a; = a; (mod (m;,m,)) for all 
i # j we now show how to find z mod m = [m,..., mx] for which z = a; (mod m;) 
for each j: For each prime power p*||m there exists an index j(p) such that p°|m,(p). 
We can then determine x (mod m) from the original Chinese Remainder Theorem, 
using the congruences x = aj(») (mod p*) for all prime powers p*||m 


We now use this technique on the example above. The three congruences may 
be rewritten in terms of prime power divisors as 
z=-—4 (mod 35) @ z=-4=1 (mod 5) andz=-—-4=3 (mod 7), 
z=17 (mod 504) © z=1 (mod 8), z=-1 (mod 9), andz=3 (mod 7), 
z=1 (mod16) © z=1 (mod 16). 


We see that m = 2*-37-5-7 = 5040, and the congruences with largest prime powers 
are 


z=1 (mod 16), z=-1 (mod9), z=1 (mod 5), and z=3 (mod 7). 
These are consistent with all six congruences above and combine to give 
z = 3041 (mod 5040). 
This algorithm requires one to factor the moduli which might be impractical with 


large moduli (see section [10.3). 


We now proceed without factoring to combine the congruences 7 = a (mod m) 
and « = b (mod n), under the assumption that a = b (mod g) where g = (m,n): 
We use the Euclidean algorithm to find integers r and s for which mr + ns = g. 
Then we can take x =c (mod L) where L = [m,n] and 


i ROO) 


since this construction immediately implies that c= a (mod m) andc = b (mod n). 
For a practical algorithm with more moduli simply combine the first two congru- 
ences, then the answer with the third congruence, etc. 


108 Appendix 3B. Solving linear congruences 


A representation of the solution modulo m. Suppose that the moduli m,..., 

mx, are again pairwise coprime. Dividing through by m, the equation (8.7.1) is 
equivalent to 

av = a,b; ab Seats andr 

m My m2 Mk 
The difference between the two sides is an integer, which we denote by n, so we let 
x1 =a 1b) + nm, and x; = ajb; for all 7 > 2, to obtain 

x “Ly v2 


Lk 
= 44S. 
m My, ms Mk 


(mod 1). 


If each m, is a power of a different prime, then this proves that one can always 
decompose a fraction with a composite denominator ine p® into a sum of fractions 
whose denominators are the prime powers p®? and whose numerators are fixed mod 
p?. 

Exercise 3.14.2. Find all integers n satisfying 13n = 407 (mod 175) and 55n = 29 (mod 63). 


Exercise 3.14.3. Suppose that integer m > 0 is given. Prove that there exist infinitely many 
integers n such that n+ 7 is divisible by m+] for 7 = 1,2,..., 100. 


Appendix 3C. Groups 
and rings 


For any ring A we define A* to be the set of elements of A with a multiplica- 
tive inverse, so that A is an additive group and A* forms a multiplicative group. 
Therefore, by Corollary [3.5.2]and exercise [3.5.4] 


(Z/mZ)* ={a (mod m): (a,m) = 1}, 


the reduced residues mod m, form a multiplicative group. 


In order for A to be a field we need multiplication to be commutative and for 
A* to equal A \ {0}. Multiplication is commutative in Z/mZ, and so we need only 
verify whether 


{a (mod m): (a,m)=1} equals {a (modm): a#0 (mod m)}. 


If m has a non-trivial divisor d, then d belongs to the second set but not the 
first. Otherwise m = p is a prime, and the two sets are obviously the same (since 
1,2,...,p—1 are all coprime to p). Hence Z/m2Z is a field if and only if m = p is 
prime. 


3.15. A direct sum 


The points in R? (also called the complex plane) are a 2-dimensional vector space 
over R. These points, usually written (2, y), form an additive group, the operation 
of addition working separately on the z- and y-coordinates. One can view this as a 
“pasting together” of two copies of the additive group R, technically called a “direct 
sum”. More generally the direct sum of the groups (Gi, *1), (G2, *2), .--, (Gr, *x) 
is denoted by 


Gi, @G2@--- OG, 


109 


110 Appendix 3C. Groups and rings 


and has the group operation 


(91,---,9k) + (ha,--- she) = (gr #1 Ri, --- 5 Ge *e Re) 
where g;,h; € G; for each j. 


Exercise 3.15.1. Verify that a direct sum of two groups indeed forms a group. 


The order of an element g of a group G is the least positive integer m, if it 
exists, for which g’” = 1. If no such m exists, then g has infinite order. For example 
3 € (Z/5Z)* has order 4, whereas —1 € (Z/nZ)* has order 2 for all n > 3. In the 
additive group (Z,+), where 0 is the identity element, all the non-zero elements 
have infinite order. 


Two groups G and H are isomorphic, and we write G = H if there is a 1-to-1 
correspondence ¢: G + H such that the group operation is conserved by the map. 
In other words, ¢(a *g b) = ¢(a) *7 $(b) for every a,b € G, where xg is the group 
operation in G and *y is the group operation in H. 

For example (Z/5Z)* ~ Z/4Z, as may be seen by mapping (2) = 1 and 
then $(4) = $(27) = 2-1 = 2 (mod 4), (3) = $(23) = 3-1 = 3 (mod 4), and 
o(1) = ¢(24) = 4-1 =0 (mod 4). To verify that the group operation is preserved by 
the map we have, for example 3-4 = 2 (mod 5) and ¢(3)+ ¢(4) = 3+2=1= ¢(2) 
(mod 4). 

The Chinese Remainder Theorem states that there is a 1-to-1 correspondence 
between the residue classes a (mod m) and the “vector” of residue classes (a1 
(mod m1), a2 (mod mz),...,a% (mod mz)), when the m,’s are pairwise coprime 
and their product equals m. Moreover the group operation of addition is conserved, 
so we can write 


Z/mZ = Z/mZ @® Z/mZ @::: ® Z/m,Z 


a (mod m) (a, (mod mj), ag (mod mg),...,a% (mod m,)). 


But this isomorphism goes beyond mere addition. It also works for multiplication 
performed componentwise; there is a 1-to-1 correspondence between the reduced 
residue classes modulo m and the reduced residue classes modulo the m; which we 
write as 


(Z/mZ)* ~% (Z/mZ)* © (Z/mZ)* O-» @ (Z/myZ)*. 


When each m,; is a distinct prime power, p°?, so that m = Is p™®, then we 
have 


Zl/mZ = Gi Z/pZ and (Z/mZ)* = © (Z/p*Z)*. 


p: por ||m p: por ||m 


Exercise 3.15.2.' Give an example of an additive group G and a subgroup H for which G is not 
isomorphic with H ® G/H. 


3.16. The structure of finite abelian groups 


An abelian group G is generated by 91, g2,...,9% if every element takes the form 


(3.16.1) 9 937 °° OR" 


3.16. The structure of finite abelian groups 111 


where each a; € Z. The g; are the generators, and we write G = (91, 92,.--,9k)- 
If G is finite, then each element of G' has an order, and if g; has order m,, then 
9; = a if and only if a; = 6; (mod m,). This implies that we may take each a; 
in to be between 0 and m; — 1. A solution to gj'g5?---g;*" = 1 is a mul- 
tiplicative dependence if not all of the 95° equal 1. If the g; have no multiplicative 
dependence, then they are multiplicatively independent, so that the exponents on 
the g; work independently from each other, and therefore the group G' is isomorphic 
to 
Z/mmZ 0 Z/m2Z®8---OZ/mzZ. 
We now show that every finite abelian group has this structure: 


Theorem 3.9 (Fundamental Theorem of Abelian Groups). Any finite abelian 
group G may be written as 


Z/mZ @ Z/m2Z @ +++ @Z/My,Z. 


Proof. Every finite abelian group G contains a set of generators: One could simply 
take all of the elements of G, but we want a set of generators of minimal size. So 
suppose that g1,..., 9, is a set of generators of G of minimal size. For each j, let m,; 
denote the order of gj, so that g? = g? where a is the least residue of r (mod mj). 
Therefore all of the elements of G can be expressed in the form 


(3.16.2) 91°99" °° 9g," with O< a; <m,—1 for each j. 


If these are all distinct, then G is the direct sum of cyclic groups with generators 
Qis+++;9k, Tespectively. Otherwise two distinct elements of G in (8.16.2) must be 
equal. Dividing one by the other we determine an element gj? ---g,* in (8.16.2) 
which equals 1, but where the a; are not all 0 (mod m;). Moreover, as gj" 4 1 if 
a; #0 (mod m,), we deduce that at least two of the a;’s are not 0 (mod mj). 


Select that set of k generators of G in which a, +----+ az is minimal in this 
dependence. At least two of the a; are non-zero, say 1 < a, < a2, so we replace our 
set of generators with another set of generators h1,..., hx where hy = gi gz and h; = 
g; for all j > 2. The multiplicative dependence now becomes h{th3?~“*hs? ...hy® = 
1, contradicting the minimality of the exponents of the gi-dependence. 


As each Z/m,Z can be written as a direct sum of cyclic groups of prime power 
order (by the Chinese Remainder Theorem), the Fundamental Theorem of Abelian 
Groups implies we can write any finite abelian group G as a direct sum of cyclic 
groups of prime power order. Then the p-part of the group G can be written as 


Gp = Z/p"L@ZL/p?L@--@LZ/pZ where e > en >---, 


which is the subgroup of G of elements whose order is a power of p. If n = |G, 


then 
G=QG,. 


p\n 


Let n, be the product of the p*'s, then nz the product of the p°s, etc., so, by 
the Chinese Remainder Theorem, 


G2Z/mZ 0 Z/neZ®---OZ/neZ where ne|ne_1|...|nalnr. 
We deduce that g”! = 1 for any g € G. 


Appendix 3D. Unique 
factorization revisited 


Does an analogy to the Fundamental Theorem of Arithmetic hold for other sets of 
mathematical objects, other than the positive integers? For example, do polyno- 
mials factor in a unique way into irreducibles? Or numbers of the form {a + b/d: 
a,b € Z}? How about other simple arithmetic sets? How about group elements? 
Or specifically the permutation group? 


3.17. The Fundamental Theorem of Arithmetic, clarified 


Why did we only state the Fundamental Theorem of Arithmetic for positive inte- 
gers? What about negative integers? To access the negative integers —1,—2,... we 
need to factor into primes and powers of —1, but the powers of —1 are not distinct, 
which seems like a big problem. However underlying this issue is that —1 is a unit, 
a number that divides 1. In the integers the only units are —1 and 1, and with 
this concept the Fundamental Theorem of Arithmetic can be neatly reformulated 
as follows: 


Every non-zero integer can be written uniquely as a unit times a product of primes. 


In other sets of numbers the units can be more complicated. For example in the 
set Zit] :-= {a+ bi: a,b © Z} we have the units 7 and —i as well as 1 and —1. 
This is because 7 is a fourth root of 1, and similar remarks may be made about sets 
generated, in the analogous manner, by the nth roots of unity (that is, the roots of 
x” —1). More interestingly, there are domains in which there are infinitely many 
units and which are not roots of unity. For example in Z[V/2] := {at+b/2: a,b € Z}, 
we start with 3 + 2/2 which divides (3 + 2V2)(3— 2/2) = 9-8 = 1 and then take 
a+ by¥2 = (34+ 2V2)* for any integer k. 


Exercise 3.17.1. Prove that the numbers (3 + 2/2)" with k = 1,2,3,... are distinct units in 


ZV 2]. 


112 


3.19. Defining ideals and factoring 113 


The Fundamental Theorem of Algebra (Theorem 3.10) states that any polyno- 
mial f(x) € C[a] has exactly d complex roots, counted with multiplicity. Therefore 
f(a) factors into polynomials of the form 2 — a with a € C, times a non-zero 
constant. Therefore polynomials can be uniquely factored, the role of the primes 
is played by the polynomials of the form x — a, and the role of the units by the 
non-zero constants. 


Exercise 3.17.2. Let R be a set of numbers containing 1 which is closed under multiplication. 
Prove that the units of R form a group. 


Exercise 3.17.3. Let f(x) be the minimum polynomial for u € R. 
(a) Prove that 2@f(1/a) is the minimum polynomial for 1/u. 
(b) Deduce that u is a unit if and only if f(0) equals 1 or —1. 


3.18. When unique factorization fails 


Let R be a set of numbers containing 1, which is closed under multiplication (for 
example, the positive integers). We call an element of R irreducible if it does not 
factor into two non-unit elements of R. We begin with a well-known example. 


Exercise 3.18.1. The set of positive integers, F, which are = 1 (mod 4), is closed under multi- 
plication and contains 1. Note that 21 is irreducible in F, despite not being prime in the positive 
integers. Show that factorization into irreducibles in not unique in F. 


This might seem to be an artificial example, so we give a more convincing one: 


Proposition 3.18.1. Let R be the ring Z[,/—5] := {a+ b/—5: a,b € Z}. The 
elements 2,3,1+./—5, and 1—./—5 are all irreducible in R, and therefore 6 factors 
in two different ways, 


6=2-3= (1++/-5)- (1—+/-5), 


into irreducibles of R. Hence there are elements of R that do not factor into irre- 
ducible elements of R in a unique way. 


Exercise 3.18.2 (Proof of Proposition [3.18-1). Let’s suppose that rational prime p = 
(a+ b\/—5)(c + d\/—5) with (a,b) = (c,d) = 1. 
(a) By studying the coefficient of the imaginary part, prove that c+ d/—5 = +(a— b—5). 
(b) Deduce that p = a” + 5b? and that this is impossible for p = 2 and p = 3. Deduce that 2 
and 3 are irreducible in R. 
(c) Now assume that 1 + /—5 = (a+ b/—5)(c + d/—5). Multiply this with its complex 
conjugate to prove that 6 = (a? + 5b?)(c? + 5d?). 
(d) Deduce that one of a? +5b? and c? +5d? equals 1, and therefore either a+bVW—5 or c+d\/—5 
is a unit. Deduce that 1+ ./—5 and 1— /—5 are irreducible in R. 


To fix what is wrong in these examples we will change our definition of a prime, 
using one of their properties over the integers which generalizes nicely into other 
rings. The idea is to work with the set of numbers that p divides, that is, the ideal 
Iz(p), rather than the numbers that divide p. 


3.19. Defining ideals and factoring 


Let R be a set of numbers that is closed under addition and subtraction, for example 
Z,Q,R, or C, but not N. We define the ideal generated by a1,...,a, over R to be 


114 Appendix 3D. Unique factorization revisited 


the set of linear combinations of a1,...,a@% with coefficients in R; that is, 
ig(dijainy Og) = {Piet Prods Ho + rEg? Pipes. y 7h Ee A} 


(a1,-.-,@, are not necessarily in R). In Corollary [3.1] and exercise [7.16] we saw 
that any ideal over Z can be generated by just one element, but this is not necessarily 
true when the a; are taken from other domains. For example if R = Z[,/—5], that 
is, the numbers of the form u + v\/—5 where wu and v are integers, then the ideal 
Ip(2,1+-+/—5) cannot be generated by just one element, as we will see below. 


A principal ideal is an ideal that can be generated by just one element. As 
every ideal in Z is principal (exercise [1.7.16), Z is called a principal ideal domain. 


Exercise 3.19.1. Prove that if Ip(a) = Ip(8), then 8 = ua for some unit u € R. 


Exercise 3.19.2. Prove that every Euclidean domain (as defined in section of appendix 2B) 
is a principal ideal domain. 


To prove that every principal ideal domain has unique factorization, we can 
show that an element is prime if and only if it is irreducible (see section 8.3 of 


DF04)). 


One can multiply ideals together by multiplying together pairs of elements of 
the ideals and establish that 


Ir(a, B) : IR, ) _ Ir(ay, ad, BY, (6). 


Therefore if n = ab in R, then Ip(n) = Ir(a)Ir(b). All issues with units disappear 
for if J is an ideal and wu a unit, then J = ul. 


Prime ideals are ideals that cannot be factored into two other ideals. (We also 
call an element p of R prime if Ip(p) is a prime ideal.) In some R, the notions of 
“irreducible” and “prime” are not in general the same. 


In “number rings” R, all ideals can be factored into prime ideals in a unique 
way, and so we get unique factorization. Note though that prime ideals are no 
longer elements of the ring or even necessarily principal ideals of the ring. 

In our example 6 = 2-3 = (1+ /—5)- (1— /—5) above, all of 2,3,1+/—5,1- 
V—5 are irreducibles of Z[,/—5] but none generate prime ideals. In fact we can 
factor the ideals they generate into prime ideals as 


Tr(2) = In(2,1+ V—5) - In(2,1— V-5), 
Tp(3) = In(3,1+ V—5) - In(3, 1 - V—-5), 
Tp(1 + V—5) = Ip(2,1+ V—5) - Ia(3,1+ V-5), 
In(1 — V—5) = In(2,1 —~V/—5) - In (3, 1 — /—5). 


None of these prime ideals are principal for if, say, [p(2,1+./—5) = Ir(a), then 2 
would factor into af for some 8 € R which contradicts Proposition 3.18.1] These 
multiplications work out since, for example, the product 


POV 5)> 1901) Te (4 OO 4/5), 20 = 6/9), PS = 8 


but then both 4 and 6 € R and so 2 =6-—4 € R, which divides all these four basis 
elements and so the product equals Ip(2). 


3.20. Bases for ideals in quadratic fields 115 


Any prime of R is irreducible, but not vice versa. Davenport asked for the 
maximum number of prime ideal factors an irreducible integer can have in R. 


Exercise 3.19.3. Prove that ifa,@ € Ip and r,s € R, then the linear combination ra+ sf € Ip. 


Exercise 3.19.4. Prove that if ai1,...,ax € R, then Ip(ai,...,a,) is a ring. 


Proposition 3.19.1. Jf R is a principal ideal domain with a € R, then a is irre- 
ducible if and only if Ip(a) is a prime ideal. Moreover R has unique factorization. 


Proof. Suppose that [p(a) factors into two non-trivial ideals. These must both 
be principal as R is a principal ideal domain; that is, there exists 6,y € R such 
that Ip(a@) = Ir(8)Ir(y) = Ir(Gy) and so a = ufy for some unit u, by exercise 
That is, a is reducible. On the other hand, if a is reducible, say a = 67, 
then Ip(a) = Ir(8)IR(7) and so Ip(a) is not a prime ideal. 


Ip(a) can be uniquely factored into prime ideals, which must be principal since 
R is a principal ideal domain. We write Ip(a) = Ir(m™1)---Irn(te) = Irn(m1 +++ Tr), 
where each 7; is irreducible and a = um, ---7, for some unit u, by exercise B.19-1] 

We need to prove that this factorization is unique. If a = 71... 7, then 
Ipn(m1)---Ir(ap) = Irn(a) = Ir(m1)---Ir(ye). By the unique factorization of prime 
ideals these can be paired off, say (perhaps after some rearrangement) [p(7;) = 
Ip(7;) for each j with k = &. Then 7; = 7;u; for some unit u,, for each j, and so 
the two factorizations differ only by units. 


3.20. Bases for ideals in quadratic fields 


One can determine a basis for a given ideal of Z[Vd] := {a+bVd: a,b € Z} where 
d is a non-square integer, which takes a special and convenient form: 


Every ideal I of Z[Vd] is either principal or can be written in the form 
Iz(s) « Iz(b+ Vd, a), 
for some integers s,a,b where a divides b? — d. 


Exercise 3.20.1. Let I be an ideal of Z[Vd], and let s be the smallest positive integer for which 
there is some r+ sV/d€ I. 

(a) Prove that if u+vVd € I, then s divides v. 

(b)t Deduce that there exists an integer m for which I = Iz(r + sVd,m). 

(c) Prove that s divides both r and m, and so deduce the claimed form of the ideal. 

(d) Prove that a divides b? — d. 


Exercise 3.20.2. Let R = Z[Vd] where d is a squarefree integer. Let I = I(b + Vd, a) where a 
divides b? — d. 

(a) Prove that I is principal if and only if |a] = 1 or |b? — dj. 
(b) Let I[°¢ := I(b— Vd,a). Prove that I. I° = (a)J where J is a principal ideal dividing (2). 
(c) Prove that I is a prime ideal if and only if [° is a prime ideal. 
(d) Prove that if J is a non-principal prime ideal, then I - [1° = (p) where p is a prime number. 


Suppose p t r+ sVd but pir? — ds?. Then Ip(p,r + sVd) is principal if and 
only if there exist integers m,n with p = m? — dn? and ms =rn (mod p). In that 
case we have Ip(p,r + sVd) = Ipn(m+nvVd). 

Therefore, for example Ip(2,1 + /—5) and Ip(3, 1+ /—5) are non-principal 
since there do not exist integers m,n for which m? + 5n? = 2 or 3. 


Appendix 3E. Gauss’s 
approach 


3.21. Gauss’s approach to Euclid’s Lemma 


Gauss took a slightly different approach to the main technical part of chapter 3, 
developing his theory from a lemma that is equivalent to Euclid’s Lemma (if c 
divides ab and (c,a) =1, then c divides b): 


Corollary 3.21.1. Suppose that gcd(a,c) =1. Then a and c both divide n if and 
only if ac divides n. 


Deduction of Corollary from Euclid’s Lemma. Since a divides n we 
can write n = ab for some integer b. Then c divides n = ab with (c,a) = 1, and so 
c divides b by Euclid’s Lemma. Hence we can write b = cd for some integer d and 
so n = acd. This implies that ac must divide n. In the other direction note that a 
and c both divide ac, which divides n, and so they both divide n 


Deduction of Euclid’s Lemma from Corollary [3.21.1] Since a and c both di- 
vide ab and (a,c) = 1, therefore ac divides ab by Corollary [3.21.1] Therefore c 
divides b by exercise [L.T.T[e). 

We also find another proof in exercise [3.2.1] 


Another proof of Corollary [3.2.2] [We have ab = (a,b)[a,b|.] Let g = (a,b), 
and then A = a/g and B = b/g, so that (A,B) = 1 by exercise [L.2.5(a). Now 
(A,B) = 1 and A and B both divide [A, B], so AB divides [A, B] by Corollary 
3.21.1] However [A,B] < AB and so we deduce that [A, B] = AB. 


Therefore, by exercise [1.4.2] we obtain 
ab = (gA)(gB) = 9- gAB =g- glcmA, B] = g- Iem[gA, gB] = g - Iem{a, 5], 


as claimed. 


116 


Appendix 3F. Fundamental 
theorems and factoring 
polynomials 


In this appendix we will prove the Unique Factorization Theorem for polynomials 
with complex coefficients, for polynomials with integer coefficients, and for polyno- 
mials mod p. In so doing we will introduce and discuss the Fundamental Theorem 
of Algebra. We then discuss resultants in more detail. 


3.22. The number of distinct roots of polynomials 


In the analogy to the Unique Factorization Theorem for polynomials with integer 
coefficients, the role of the primes is played by the irreducible polynomials f of 
content 1, and the role of the units, {1,—1}, is played by the integers. We prove 
this by induction on the degree of f. First, f can be written as a product of 
irreducibles, for either f is irreducible or f is the product of two polynomials in 
Z[x] of smaller degree > 1, which can therefore be factored into irreducibles by the 
induction hypothesis, and so f is the concatenation of those two products. The 
proof that the factorization is unique is analogous to the proof over the integers: 


Exercise 3.22.1 (Euclid’s Lemma for polynomials). Suppose that ged(f(x), 9(#))cja] = 1 and 
f(x) divides g(x)h(x). Deduce that f(x) divides h(a). 


The Fundamental Theorem of Algebra is the well-known result that a polyno- 
mial f(x) = eae f;x? with complex coefficients of degree d has exactly d roots 
(where the roots are counted according to their multiplicity; for example, x(a —1)? 
has the three roots 0, 1, and 1, the root 1 counted twice). This theorem, used 
without proof while a student learns basic mathematics, is rather more subtle to 
prove than one might guess. Four of the great mathematicians, Euler (1749), La- 
grange (1772), Laplace (1795), and Gauss (1799) published purported proofs which 


117 


118 Appendix 3F. Fundamental theorems and factoring polynomials 


had subtle errors, before the first correct proof appeared in 1814 due to Argand. 
The difficulty in proving the Fundamental Theorem of Algebra is the following first 
step, which appears to be innocuous but isn’t. 


Lemma 3.22.1. If f(x) € C[z] has degree d > 1, then f(x) has at least one root 
in C. 


We will not prove this but we will deduce the Fundamental Theorem from it: 


Theorem 3.10 (The Fundamental Theorem of Algebra). If f(x) € C[a] has degree 
d> 1, then f(x) has exactly d roots in C, counted with multiplicity. 


Proof. By induction. For d = 1 we note that ax +b has the unique root —b/a. For 
higher degree, f has a root a by Lemma[3.22.1] Let g(x) = « — a in Proposition 
[2.10.I]to obtain a polynomial g(x) and a constant r for which f(a) = (~x—a)q(a)+r. 
Substituting in « = a, we deduce that r = f(a) = 0; that is, f(a) = (a — a)q(z). 
Now q(a) has degree d— 1 so has d— 1 roots, counted with multiplicity, by the 
induction hypothesis. Therefore f(x) = (« — a)q(a) has 1 + (d— 1) = d roots, 
counted with multiplicity. 


We note that if f(x) = a f;x), then 


é i od d 
q(x) = F(x) = f(a) _ S fi (= ) _ : Ae + agi? cere 2 a1), 
j=0 7=0 


«t—Q xa — 


If we do not assume Lemma [3.22.1] then our proof of Theorem [3.10] can be easily 
modified to prove the following unconditionally. 


Theorem 3.11 (Weak Fundamental Theorem of Algebra, Lagrange (1772)). If 
f(x) € Cla] has degree d > 1, then f(x) has no more than d distinct roots in C. 


Proof. If f has no roots, then we are done. Otherwise if f(x) has a root, call it 
a, then we write f(x) = (2 — a)q(x) as in the proof of Theorem B.10] Now q(x) 
has degree d— 1 so, by the induction hypothesis, has < d— 1 distinct roots, which 
implies that f(x) = (a — a)q(x) has < 1+ (d— 1) distinct roots. 


Exercise 3.22.2. (a) Deduce that every irreducible polynomial in C[z] has degree one. 
(b) Prove that the set of d roots of f(x) is unique. 
(c) Prove that every irreducible polynomial in R[z] has degree one or two. 


Following exercise the Fundamental Theorem of Algebra implies that 
one can factor any polynomial of C[a] in a unique way into linear factors. This is 
much like the Fundamental Theorem of Arithmetic, the polynomial x — a taking 
the place of the primes and the elements of C \ {0} taking the place of {—1, 1}. 


The Euclidean algorithm for polynomials 


As discussed in section 2.10] of appendix 2B, the obvious analogy to Euclid’s algo- 
rithm may be used to find g(a), the polynomial of highest degree which divides two 
given polynomials a(a) and b(x) (in C[a]). More precisely if 


a(a) = [[@ —t)™ and b(a) = [[@ — +t), then g(x) = I[¢ ig eH 


tec teC tec 


The Euclidean algorithm for polynomials 119 


where each a;, 8; is an integer > 0 and only finitely many are non-zero. The 
operations of the Euclidean algorithm respect the field, so if a(x), b(2) € Qa], then 
there exist u(a), v(x) € Q[z], such that 


(3.22.1) a(x)u(a) + b(x)v(x) = g(x). 


As discussed in section [0.14] of appendix OE, the minimum polynomial f for an 
algebraic number a € C is in Z[z], has content 1 and a positive leading coefficient, 
and is unique. 


Proposition 3.22.1. Let a € C and let f(x) € Z[x] be the minimum polynomial 
for a. If g(x) € Z[a] with g(a) =0, then f(x) divides g(x) in Q|z]. 


Proof. By exercise 2.10.5{c) there exist q(x),r(a) € Z[z] and k € Z with 0 < 
degr < deg f — 1, such that kg(x) = q(x) f(x) + r(x). Hence r(a) = kg(a) — 
q(a) f(a) = 0 and r(a) has smaller degree than f(x). As f is the minimum poly- 
nomial for a, the only possibility for r(x) is 0. Hence kg(x) = q(x) f(x) and the 
result follows. 


Exercise 3.22.3. Show that if a has minimal polynomial f(x) and is another root of f(x), 
then f(z) is also the minimum polynomial for £. 


Another result that will be useful is 


Lemma 3.22.2. If f(x), g(x) € Q|z] are monic and f(x)g(x) € Za], then f(x) 
and g(x) € Za]. 


Proof. If the conclusion is false, then some coefficient of f(a) is not an integer, so 
let p be a prime dividing the denominator of that coefficient of f(a). Let p* and 
p® be the highest powers of p dividing the denominator of any coefficient of f(z) 
and g(x), respectively, so that a > 1 and b > 0. Therefore we may write p* f(x) = 
fax’? +--+ (mod p) where fy # 0 (mod p), and similarly p’g(z) = gpr* + --- 
(mod p) where g, #0 (mod p). Nowa+6> 1 and f(x)g(x) € Zz] so that 


0 = p**? f(x) g(x) = (fav® +-+-)(grx® +--+) = faget’*® +++. (mod p), 


which implies that p divides fag;, a contradiction. 


Lemma8.22.2]implies that we can conclude f(a) divides g(x) in Z[z] in Propo- 
sition 8.22.1] 


Corollary 3.22.1 (Gauss’s Lemma). Suppose that f(x) € Z[a] is monic. Then 
f(a) is irreducible in Z|x] if and only if f(a) is irreducible in Q|a]. 


Proof. One direction is trivial. We need only prove that if f(x) is irreducible 
in Z[x], then it is irreducible in Q[az]. If not, then f(r) = g(x)h(ax) for some 
polynomials g(x), h(a) € Q[x] which we may assume are monic by dividing them 
each by their leading coefficients. But then g(a), h(x) € Z[x] by Lemma[3.22.2] 


120 Appendix 3F. Fundamental theorems and factoring polynomials 


Unique factorization of polynomials modulo p 


We can reduce polynomials in Z[{a] to polynomials mod p by defining f(#) = g(x) 
(mod p) if and only if there exists h(x) € Z[a] for which f(x) — g(x) = ph(2). 

If g(x) is a polynomial whose coefficients belong to residue classes mod p, then 
we can write g(z) as a polynomial with integer coefficients (by taking representatives 
of the residue classes), but with the understanding that we are studying g(a) mod 
p. If g(a) has leading term ca? where c # 0 (mod p), let d= 1/c¢ (mod p), and so 
G(x) := dg(x) (mod p) is monic. 

Given polynomials f(a), g(x) (mod p) | select F(a), G(x) € Z[z] with G monic 
for which F(a) = f(x) (mod p) and G(#) = dg(x) (mod p). We then apply exercise 
[2.10.5{b) so there exists Q(x), R(x) € Z[z] with F = GQ + R and deg R < degG. 
Hence if g = dQ (mod p) and r = R (mod p), we have f = gqg+r (mod p) with 
degr < degg. This means that the Euclidean algorithm works for polynomials 
(mod p), that we can prove the analogy to Euclid’s Lemma, and that polynomials 
mod p have unique factorization. (The readers should convince themselves of the 
details.) 


3.23. Interpreting resultants and discriminants 


In sections [2.10] and [2.11] of appendix 2B we noted that if f and g are polynomials 
with integer coefficients which have no common root, then there exist polynomials 
a,b € Z[x] with dega < deg g and degb < deg f and a positive integer R for which 


(3.23.1) a(x) f(a) + b(@)g(x) = R. 
We call the smallest such R the resultant of f and g. 

Now, let us suppose that there exists an integer m such that f(m) = g(m) =0 
(mod p). Substituting x = m into (.23-1) we see that p divides R. But what if 
f and g have no common root mod p but do have a common non-linear factor? 
For example, if f(z) = x? — 2 and g(x) = 4x? + 1, then they are both = 2? +1 
(mod 3), which has no roots mod 3, but then f and g do have a repeated factor 
mod 3, which is why 3 divides R = g —4f =9. This all follows from 


Proposition 3.23.1. Suppose that f(x), g(x) € Z|x] have no common roots. Then 
prime p divides the resultant of f and g if and only if f and g have a common 
polynomial factor mod p. 


Proof. Suppose that f and g have a common factor h(a) mod p (which is not 
necessarily a linear polynomial). Therefore h(a) divides af + bg = R mod p. But 
since deg R = 0 this is impossible, unless R = 0 (mod p). 

Now suppose that prime p divides R so that a(x) f(x) = —b(x)g(x) (mod p). 
Hence f(a) divides b(a#)g(a) (mod p), but f has higher degree than b and so it 
must have some factor in common with g(x) (mod p), by the unique factorization 
of polynomials mod p. 


15More precisely, elements of (Z/pZ) [2]. 


3.24. Other approaches to resultants and gcds 121 


This is an extraordinary result. It tells us that the prime factors of R are 
exactly the primes for which f and g have a common root mod p; and a study of 
the prime powers dividing R can tell us more about multiple factors[!4 


The discriminant of f is defined to be the resultant of f and f’. We immediately 
deduce the following: 


Corollary 3.23.1. Suppose that f(x) € Z[x] has no repeated roots. Then prime p 
divides A, the discriminant of f if and only if f has a repeated polynomial factor 
mod p. 
A few examples: If f(x) = ax? + br +c, then f’(x) = 2ax + b and so 
(2ax + b)(2ax + b) — 4a(ax? + ba + c) = b? — 4ac. 
If f(z) = 2? +ax +, then f’(x) = 3x? +a and so 
9(3b — 2axr)(x* + ax + b) + (6ax? — 9ba + 4a?) (3a? + a) = 4a? + 2767. 


3.24. Other approaches to resultants and gcds 


The polynomial common factor of highest degree of f and f’ can be obtained by 
using the Euclidean algorithm but can also be described as follows: Suppose that 
f(a) € Za] has an irreducible factor p(x) and that p(x)°||f(a). Therefore we can 
write f(x) = p(x)*g(x) where p(x) { g(x) and so 

f'(x) = p(x)°g' (x) + ep(x)"g(x) = p(a)"(p(x)g'(x) + eg(2)) 


Now (p(x), p(x)g'(x) + eg(x)) = (p(x), eg(@)) = 1 and so p(x)*"||f/(z). This 
implies that p(x)*1||(f’(x), f(x)) and so 


ged(f(m), (@))= JT vay" 
74 p(a)* Il F(x) 
p(x) irreducible 
and therefore is divisible by any repeated factor of f(x). 


We are more interested here in the case where we are given two polynomials 
that have no common factor. We begin with resultants. Suppose we are given 
polynomials f(a) = Sue fiz’ and g(x) = aan g;x) and we wish to know if there 


exist polynomials a(x) = re a;x* and b(x) = ear bj;a7 for which af + bg isa 


non-zero, integer. The polynomials a and b contain d+ D variables to be assigned 
values, and the polynomial af+bg has degree d+ D—1 and so d+ D—1 coefficients 
that must be 0. Each of the coefficients is a linear polynomial in the a; and b; so 
we can determine whether or not there is a solution by linear algebra. This is all 
discussed in detail in exercises 29 to 32 of section 14.6 of [DF04]. They also prove 
some lovely formulas for R’ (a small multiple of R): 


d D d D 
R' = fP9h [| [[(ai-8) =f? [] sa) = (-1)" 94 TI £(6), 


i=1 j=l j=l 


16 The usual textbook definition of resultant (see, e.g., exercise 29 of section 14.6 of [DF04]) allows 
one to find polynomials a and b for which af + bg is a non-zero integer; call it R’. However R’ can be 
a multiple of R. For example, if f(x) = 3x2 and g(x) = 32 +1, then we have g— f =1=R but R’ =3. 


The reason for this is that the textbook definition also determines when there are common factors of 
the leading coefficients of f and g. 


122 Appendix 3F. Fundamental theorems and factoring polynomials 


where the a; are the roots of f and the 6; are the roots of g, both counted with 
multiplicity. As a consequence, if f(a) has no repeated factors and g = f’, then 


d 
R= f2 |] f(a) 
i=1 
a multiple of the discriminant of f. 


Additional exercises 


Exercise 3.24.1. Suppose that f(a) € Z[z]. 

a) Show that if f has an integer root n, then f(n) = 0 (mod m) for any integer m. 

(b) Suppose that f has a rational root r/s, where r and s are coprime integers. Show that if 
(s,m) = 1, then there exists an integer n such that f(n) =0 (mod m). 

c) For each integer m give an example of a polynomial f which has a rational root r/s with 
(r,s) = 1, but for which there does not exist an integer n such that f(n) =0 (mod m). 


Exercise 3.24.2. Suppose that f(x) € Z[z] has degree d, and let a, 3, 7,6 € Z with ad — By = 1. 

a) Show that f(z) is irreducible if and only if 2? f(1/a) is irreducible. 

b) Show that f(x) is irreducible if and only if (ya + 5)4 f (S248) is irreducible. (Remark: The 
easy way to prove this is to know that all such transformations can be given as a composition 


of the transformations z > z+1 and z > —1/z as we saw in section[1.10]) 


Exercise 3.24.3. 
(a) Give examples of cubic polynomials in Z[z] with no roots in Z. 
(b) Give examples of cubic polynomials in Z[z] with all three roots in Z. 
(c) Give examples of cubic polynomials in Z[x] with exactly one root in Z. 
(d) Prove that there are no examples with precisely two roots in Z. 
(e) Answer these same questions for cubic polynomials when Z is replaced by Z/pZ for some 
odd prime p. 


Appendix 3G. Open problems 


3.25. The Frobenius postage stamp problem, II 


Suppose that a and b are positive coprime integers. We wish to better understand 
the sets 


P(a, 6) := {am + bn: m,n€ Z, m,n > 0} ={N > 0} - E(a, ). 


Theorem 3.12. [fa > b> 1 are coprime integers, then ab — a — b is the largest 
integer in E(a,b). 


Proof. For a given integer N, let j be the least residue of N/a (mod b). We will 
show that if we can write N = am-+ bn for integers m,n > 0, then there is such 
a representation of N with m = j: Since am = N = aj (mod b), therefore m = j 
(mod b) and som > j. We may write m = j + bi for some integer 7, and so 
N =a(jt+bi) +n = aj +b(n+ ai) = aj + bk for some k > n > 0, as desired. 
We deduce that 
b-1 
P(a,b) = U {N€Z: N=aj (mod b) and N > aj}. 
j=0 
Therefore 
b-1 
E(a,b) = U {NeZ: N=aj (mod b) and0< N < aj}. 
j=0 
The largest element of the jth set in E(a,b) is aj — b (as this is > 0). Hence the 
largest element of E(a,b) is a(b— 1) —b=ab—a-—b. 


Exercise 3.25.1. Let a and b be positive integers with g = gcd(a, b). Prove that ab/g —a— bis 
the largest integer, divisible by g, that is not represented as am + bn with m,n > 0. 


Exercise 3.25.2. Let a and b be positive coprime integers. Suppose that r = ma-+ nb for some 
integers m,n in the ranges 1 << m<b—landl<n<a-l. 
(a) Prove that r is the smallest integer in P(a,b) which is =r (mod ab). 


123 


124 Appendix 3G. Open problems 


(b) What is the smallest integer s € P(a,b) which is = —r (mod ab)? 

(c) Show that exactly one of r and s is < ab, and deduce that if 1 < N < ab where a{ N and 
b{.N, then exactly one of N and ab— N belongs to P(a, b). 

(d) Show that there are exactly 4 (a — 1)(6—1) elements of E(a, 6). 


Exercise 3.25.3. (This continues from exercise[I.16.2]) An integer n is powerful if p? divides n 


whenever prime p divides n. Prove that we can write any powerful integer n as n = a?b? where a 


and 6 are integers. 


Exercise 3.25.4. Let a and b be positive coprime integers and select r and s in the ranges 
1<r<b—1land1<s<a-—1s0 that ar =1 (mod bd) and bs = 1 (mod a). Prove that there 


are x +1 {2%} {3X} representations of N as N = ma+ nb with integers m,n > 0. 
Exercise 3.25.5. Suppose that a1,...,a% are positive integers for which gcd(a1,...,a,%) = 1. 
Let 
P(ai,...,@n) = {aini +--+ + apne: 11,...,nk € Z, n1,...,n~ > OF. 
(a) Prove that there exists an integer N such that every integer > N belongs to P(a1,..., ax). 
(b) Show that we may take N = (k —1) Icm[ay,..., ax]. 


It is not easy to prove the analogy to Theorem for sets of three or more integers 
a;. In fact, determining a formula for the largest integer that does not belong to 
P (a,b,c) (for arbitrary a,b,c) is an open problem. 


3.26. Egyptian fractions for 3/b 


In section [17] of appendix 1E we showed that every fraction a/b with (a,b) = 1 
can be written as a sum of at most a distinct fractions of the form 1/n. We proved 
that this is best possible for a = 1 and 2, and we do so now for a =3: 


Theorem 3.13. Suppose that b is a prime. Then 3/b can be written as the sum of 
two distinct Egyptian fractions if and only if b= —1 (mod 3). 


Proof. Suppose that 3/b = 1/m+1/n with m <n. Let g = gcd(m,n) so that 
m = gM,n = gN with gced(M, N) = 1, and therefore 3g MN = (M+ N)b. We 
have (M+ N,M) = (M+N,N) = (M,N) = 1 and so MN divides the prime b. 
As M < N this implies that M = 1 and N = 8, and therefore b = 3g —1 = —-1 
(mod 3). 


On the other hand if b = 3k —1, then 325 = 


; a 
" K(k—1)° 


Exercise 3.26.1. Fix integer a > 3. Suppose that b is a prime. Prove that a/b can be written 
as the sum of two distinct Egyptian fractions if and only if b= —1 (mod a). 


Exercise 3.26.2. Suppose that a and b are coprime positive integers. Prove that we have a 


solution to $ = + + + with (m,n) = 1 if and only if a? — 4b is the square of an integer. 


3.27. The 3x + 1 conjecture 
Start with any positive integer n and transform it according to the following map: 


3n+1 if n is odd, 


n 
n/2 if n is even. 


3.27. The 3x + 1 conjecture 125 


Iterate. The 32+ 1 conjecture states that one eventually gets down to 1. For 
example 


99 > 298 > 149 > 448 > 224 > 112 > 56 > 28 — 145 7 > 22 5 115 
> 34 > 17 > 52 — 26 — 138 > 40 — 20 > 10 5 > 16 > 8 45251, 


a long and circuitous route, called the orbit of 99. (Note that 1 + 4 + 2 > 1.) This 
is what makes the 3x + 1 conjecture so difficult; there seems to be no formula that 
helps understand where the iterates go in the long run. And, in many examples, 
the elements of the orbit do not get smaller quickly; indeed they often get quite 
a bit larger than the number that one started with, before decreasing to 1. One 
can prove that there are orbits in which the numbers get larger than the original 
integer by an arbitrary factor. 


Before proving this we modify the 32+ 1 algorithm for convenience: Given n 
odd we transform to 3n + 1, but given even n we divide out as many powers of 2 
as possible, all in one go. Thus the above orbit now looks like 


99 > 298 > 149 > 448 = 2.7 575 22 5 
11> 34517552 5135 40 ~-5> 1651. 


Exercise 3.27.1. Let 29 = 2*m— 1. Suppose that the iterates of the modified 32 +1 algorithm 
go To > Y1 > 1 > Y27 %2—->.... 
(a) Prove that x; = 3/2*-jm —1 for j =0,1,...,k. 
(b) Deduce that there exist integers xo such that there is an nth iterate xn for which rn /x0 is 
arbitrarily large. 


Exercise 3.27.2. Prove that 2*||N if and only if N = 2* (mod 2*+1), 


To analyze how quickly we expect the numbers in the modified 32+1 algorithm 
to decrease, we now determine the expected size of the largest odd part of 3n + 1. 

The probability that, for a random odd integer n, we have 3n + 1 = 2* 
(mod 2*+') is 1/2" (since we are guaranteed that 3n+1 is even) by exercise B.27.2] 
Hence the expected size of the largest odd part of 3n + 1 is 


1 3n4+1 1 1 
k>1 k>1 


Of course it is never near to this size (the possibilities are oe teed Saft sche) 


but this is the average of those possibilities. 


Similar probabilistic models suggest that if an orbit starts from n, then all 
numbers in the orbit are < Cn? for some large constant C. The orbit that begins 
with n = 1980976057694848447 contains a number which is almost 25n?. 


It is not impossible that there exists some integer n > 1, such that after some 
iterate we return to n, and therefore the 3x + 1 conjecture is false. If such an 
example exists, then it is known that the cycle has period at least 10 billion! 


126 Appendix 3G. Open problems 


Further reading on these open problems 


[1] Ronald L. Graham, Paul Erdés and Egyptian fractions, Bolyai Soc. Math. Stud., 25 (2013), 289- 
309. 


[2] Jeffrey C. Lagarias, The 3x +1 problem and its generalizations, Amer. Math. Monthly 92 (1985), 
3-23. 


[3] Alfonsin Ramirez, The Diophantine Frobenius problem, Oxford Lecture Series in Mathematics and 
its Applications 30, Oxford University Press, Oxford, 2005. 


Chapter 4 


Multiplicative functions 


In the previous chapter we discussed 7(n), which counts the number of divisors of n. 
We discovered that T(n) is a multiplicative function, which allowed us to calculate 
its value fairly easily. Multiplicative functions, so called since 


f(mn) = f(m) f(n) for all pairwise coprime, positive integers m and n, 


play a central role in number theory. (Moreover f is totally multiplicative, or 
completely multiplicative, if f(mn) = f(m)f(n) for all integers m,n > 1.) Thus 
the divisor function, T(r), is multiplicative but not totally multiplicative, since 
T(p*) = a +1, and so r(p?) = 3 is not equal to 7(p)? = 2?. Common examples 
of totally multiplicative functions include f(n) = 1, f(n) = n, and f(n) = n° 
for a fixed complex number s. Also Liouville’s function (n) which equals —1 
to the power of the total number of prime factors of n, counting repetitions of 
the same prime factor. For example (2) = A(3) = A(12) = A(32) = —1 and 
A(4) = A(6) = A(10) = A(60) = 1. 

What makes multiplicative functions central to number theory is that one can 
evaluate a multiplicative function f(n) in terms of the f(p*%) for the prime powers 
p© dividing n. 


Exercise 4.0.1. Show that if f is multiplicative and n = [] pp, then 


p prime 


fn)= [I fo). 


p prime 


Deduce that if f is totally multiplicative, then f(n) =|], f(p)°. 


Exercise 4.0.2. Prove that if f is a multiplicative function, then either f(n) = 0 for all n > 1 or 


fQ) =1. 


Exercise 4.0.3. Prove that if f and g are multiplicative functions, then so is h, where h(n) = 
f(n)g(n) for all n > 1. 


Exercise 4.0.4. Prove that if f is completely multiplicative and d|n, then f(d) divides f(n). 


127 


128 4, Multiplicative functions 


Exercise 4.0.5. Prove that if f is multiplicative and a and 6b are any two positive integers, then 


f((a, 6) Fla, bl) = Fla) F(e). 


In this chapter we will focus on two multiplicative functions of great interest. 


4.1. Euler’s ¢-function 
There are 

o(n) := #{m: 1<m<n and (m,n) = 1} 
elements in any reduced system of residues mod n. Obviously ¢(1) = 1. 


Lemma 4.1.1. ¢(n) is a multiplicative function. 


Proof. Suppose that n = mr where (m,r) = 1. By the Chinese Remainder Theo- 
rem (Theorem[3.6) there is a natural bijection between the integers a (mod n) with 
(a,n) = 1 and the pairs of integers (b (mod m),c (mod r)) with (b,m) = (c,r) = 1. 
Since there are ¢(m)@(r) such pairs (b,c) we deduce that ¢(n) = d(m)¢(r). 


Hence to evaluate ¢(n) for all n we simply need to evaluate it on the prime 
powers, by exercise [4.0.1] This is straightforward because (m,p°) = 1 if and only 
if (m, p) = 1; and (m, p) = 1 is not satisfied if and only if p divides m. Therefore 

d(p®) = #{m: 1<m<p* and (m,p) = 1} 
= #{m: 1<m<p*}—#{m: 1<m<p* and p|m} 
1 


— pe — po 
by exercise [L.7.6[c). We deduce the following: 


Theorem 4.1. Jfn = Ie primeP’?, then 
e €y—1 e i il 
on) = TT @-v) = T] ve (a-2) =n TT (1-3). 
p ie Dp ate p cg 


Example. $(60) = 60- (1— 5) (1— 4) (1— 4) = 16, the least positive residues 
being 
1,7, 11,13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, and 59. 


We give an alternative proof of Theorem [4.1] based on the inclusion-exclusion prin- 
ciple, in section [4.5] 
Studying the values taken by ¢(n), one makes a surprising observation: 


Proposition 4.1.1. We have )7q),, O(d) =n. 


Example. For n = 30, we have 


(1) + O(2) + O(3) + 65) + 66) + 6(10)+e(15) + e(30) 
= 1414244+24+4+84+8= 30. 


4.2. Perfect numbers. “The whole is equal to the sum of its parts.” 129 


Proof. Given any integer m with 1 < m < n, let d = n/(m,n), which divides n. 
Then (m,n) = n/d so one can write m = an/d with (a,d) =1and1<a<d. Now, 
for each divisor d of n the number of integers m for which (m,n) = n/d equals the 
number of integers a for which (a,d) = 1 and 1 < a < d, which is ¢(d) by definition. 
We have therefore shown that 


n= #{m: 1<m<n}=S > #{m: 1<m<nand (m,n) = n/d} 


d|n 
=> #{m: m=a(n/d), 1<a<dand (a,d) = 1} 
d|n 
=> #{a: 1<a<dand (a,d)=1}=5 > 4(d), 
d\n d\n 


which is the result claimed. 


Exercise 4.1.1. Prove that if d|n, then ¢(d) divides $(n). 


Exercise 4.1.2. Prove that if n is odd and ¢(n) = 2 (mod 4), then n has exactly one prime 
factor (perhaps repeated several times). 


Exercise 4.1.3. Prove that )))<m<n, (m,n)a1™ = r(n)/2 and []q), d= nt(n)/2, 


Exercise 4.1.4. (a) Prove that ¢(n?) = nd(n). 
(b) Prove that if d(n)|n — 1, then n is squarefree. 
(c) Find all integers n for which ¢(n) is odd. 


Exercise 4.1.5.1 Suppose that n has exactly k prime factors, each of which is > k. Prove that 
o(n) 2 n/2. 


4.2. Perfect numbers. “The whole is equal to the sum of its parts.” 


The number 6 is a perfect number since it is the sum of its proper divisors (the 
proper divisors of m are those divisors d of m for which 1 < d < m); that is, 


6=1+2+3. 


Six is a number perfect in itself, and not because God created all things 
in six days; rather, the converse is true. God created all things in six 
days because the number is perfect. 

— from The City of God by SAINT AUGUSTINE (354-430) 


The next perfect number is 28 = 1+2+4+7+414 which is the number of days in 
a lunar month. However the next, 496 = 1+2+4+4+8+416+4 31+ 62 +4 1244 248, 
appears to have little obvious cosmic relevance. Nonetheless, we will be interested 
in trying to classify all perfect numbers. To create an equation we will add n to 
both sides to obtain that n is perfect if and only if 


2n = o0(n), where a(n) := S- d. 
d\n 


Exercise 4.2.1. Show that a(n) = )7q),,n/d, and so deduce that n is perfect if and only if 
Te 
Dadian dad 2. 


130 4, Multiplicative functions 


Exercise 4.2.2. (a) Prove that each divisor d of ab can be written as €m where é|a and m|b. 
(b) Show that if (a,b) = 1, then there is a unique such pair ¢, m for each divisor d. 


By this last exercise we see that if (a,b) = 1, then 
j=) a= S- lm= 5 >t.) >m=a(a)o(b), 
dlab Lla, m|b Lla m|b 


proving that o is a multiplicative function. Now 


k 2 i 0 1 
CO) Lae DR 
by definition, and so 
pretl — 4 
Pp 


6 4 3 2 
For example o(2°- 3°-5?-7) = 45}. $=}. 51. 4. 


Euclid observed that the first perfect numbers factor as 6 = 2-3 where 3 = 2?—1 
is prime, and 28 = 2?- 7 where 7 = 2? — 1 is prime, and then that this pattern 
persists: 

Proposition 4.2.1 (Euclid). If 2? —1 is a prime number, then 2?—1(2? — 1) is a 
perfect number. 


The cases p = 2, 3, 5 correspond to the Mersenne primes 2? — 1 = 3, 27-1 = 
7, 2° —1 = 31 and therefore yield the three smallest perfect numbers 6, 28, 496 
(and the next smallest examples are given by p = 7 and p = 13). 


Proof. Since o is multiplicative we have, for n = 2?~1(2? — 1), 
2? —1 
2-1 


a(n) = o(2?-')-o (2? -1) = 


A 4Q? 1) GP —1)-3P an. 


After extensive searching one finds that perfect numbers of the form 2?~'(2?—1) 
with 2?—1 prime appear to be the only perfect numbers. Euler succeeded in proving 
that these are the only even perfect numbers, and we believe (but don’t know) that 
there are no odd perfect numbers. If there are no odd perfect numbers, as claimed, 
then we would achieve our goal of classifying all the perfect numbers. 


Theorem 4.2 (Euclid). Ifn is an even perfect number, then there exists a prime 
number of the form 2? —1 such that n = 2?~'(2? —1). 


In exercise we showed that if 2? — 1 is prime, then p must itself be prime. 
Now, although 2? — 1, 23-1, 25-1, and 27 —1 are all prime, 2!' —1 = 23 x 89 is not, 
so we do not know for sure “whedhet 2? — 1 is prime, even if p is prime. However it 
is conjectured that there are infinitely many Mersenne primes M, = 2? — if] which 
would imply that there are infinitely many even perfect numbers. 


‘Tt is known that 2? — 1 is prime for p = 2,3,5,7,13,17,19,...,82589933, a total of 51 values 
as of September 2019 (and this last is currently the largest prime explicitly known). There is a long 
history of the search for Mersenne primes, from the first serious computers to the first great distributed 
computing project, GIMPS (Great Internet Mersenne Prime Search). 


4.2. Perfect numbers. “The whole is equal to the sum of its parts.” 131 


Proof. Any even integer can be written as n = 2*~!m where m is odd and k > 2, 
so that if n is perfect, then 


2*m = 2n = a(n) = o(2*-1)o(m) = (2 — 1)o(m). 
Now (2* —1,2*) = 1 and so 2* — 1 divides m. Writing m = (2* —1)M we find that 
a(m) = 2*M =m+M. That is, ¢(m), which is the sum of all of the divisors of m, 
equals the sum of just two of its divisors, namely m and M (and note that these 
are different integers since m = (2* —1)M > (2?-—1)M > M). This implies that 
m and M are the only divisors of m. The only integers with just two divisors are 
the primes, so that m is a prime and M = 1, and the result follows. 


It is widely believed that the only perfect numbers were those identified by 
Euclid, that is, that there are no odd perfect numbers. It has been proved that if 
there is an odd perfect number, then it must be > 10!°°°, and it would have to 
have more than 100 (not necessarily distinct) prime factors. 


Exercise 4.2.3. (a) Prove that if p is odd and k is odd, then o(p*) is even. 
(b)t Deduce that if n is an odd perfect number, then n = pm? where p is a prime that does 
not divide the integer m > 1 and p= ¢=1 (mod 4). 


Exercise 4.2.4. Fix integer m > 1. Show that there are only finitely many integers n for which 
a(n) =m. 


Exercise 4.2.5.1 (a) Prove that for all integers n > 1 we have the inequalities 


(a se" i P 


ee 
pin P pin? 


(b) We have seen that every even perfect number has exactly two distinct prime factors. Prove 
that every odd perfect number has at least three distinct prime factors. 


Additional exercises 


Exercise 4.3.1. Suppose that f(n) = 0 if n is even, f(n) = 1 ifn =1 (mod 4), and f(n) = —-1 
if n = —1 (mod 4). Prove that f(.) is a multiplicative function. 


Exercise 4.3.2.! Suppose that r(.) is a multiplicative function taking values in C. Let f(n) = 1 
if r(n) £0, and f(n) = 0 if r(n) =0. Prove that f(.) is also a multiplicative function. 


Exercise 4.3.3.! Suppose that f is a multiplicative function, such that the value of f(n) depends 
only on the value of n (mod 3). What are the possibilities for f? 


Exercise 4.3.4.4 Suppose that f is a multiplicative function, such that the value of f(n) depends 
only on the value of n (mod 8). What are the possibilities for f? 


Exercise 4.3.5. How many of the fractions a/n with 1 < a<n-—1 are reduced? 


Looking at the values of ¢(m), Carmichael conjectured that for all integers m 
there exists an integer n 4 m such that $(n) = o(m). 


Exercise 4.3.6.1 (a) Find all integers n for which 6(2n) = (n). 
(b) Find all integers n for which ¢(3n) = $(2n). 
(c) Can you find other classes of m for which Carmichael’s conjecture is true? 
Carmichael’s conjecture is still an open problem but it is known that if it is false, then the 
smallest counterexample is > 191079. 


132 


4, Multiplicative functions 


Exercise 4.3.7.1 


(a) Given a polynomial f(x) € Z[a] let Ny(m) denote the number of a 


(mod m) for which f(a) = 0 (mod m). Show that Ny(m) is a multiplicative function. 
(b) Be explicit about N¢(m) when f(x) = 2? — 1. (You can use section [B.8]) 


Exercise 4.3.8.1 Given a polynomial f(x) € Z 
for which there exists a (mod m) with f(a) = 6 
function. Can you be more explicit about Ry (m) 


Exercise 4.3.9. Let 7(n) denote the number o 


x] let R¢(m) denote the number of b (mod m) 
(mod m). Show that R¢(m) is a multiplicative 
for f(x) = x?, the example of exercise 2.5.6 


f divisors of n (as in section B.3), and let w(n) 


and Q(n) be the number of prime divisors of n not counting and counting repeated prime factors, 


respectively. Therefore 7(12) = 6,w(12) = 2, and 


Q(12) = 3. Prove that 


gu (nr) < r(n) < g2(n) 


for all integers n > 1. 


Exercise 4.3.10. Let o;,(n) = aln d®. Prove that o,(n) is multiplicative. 


Exercise 4.3.11. (a) Prove that r(ab) < 7(a)r(b) for all positive integers a and b, with equality 
if and only if (a,b) = 1, 
(b) Prove that o,(ab) < o%(a)o%(b) for all positive integers a, b, and k. 
(c) Prove that on4¢(n) < ox(n)oe(n) for all positive integers k, €, and n. 


Exercise 4.3.12. Give closed formulas for (a) 50” _, gcd(m,n) and (b)# 53” _, lem(m,n) in 
terms of the prime power factors of n. 


Exercise 4.3.13. n is multiplicatively perfect if it equals the product of its proper divisors. 
(a) Show that n is multiplicatively perfect if and only if r(n) = 4. 
(b) Classify exactly which integers n satisfy this. 


The integers m and n are amicable if the sum of the proper divisors of m equals 
nm and the sum of the proper divisors of n equals m. For example, 220 and 284 are 
amicable, as are 1184 and 1210[ 


Exercise 4.3.14. (a) Show that m and n are amicable if and only if o(m) = o(n) =m+n. 
(b) Verify Thabit ibn Qurrah’s 9th-century claim that if p= 3 x 2"-!—1, q=3x 2”—1, and 
r=9x 2?"-1~_ 1 are each odd primes, then 2"pq and 2”r are amicable] 
(c) Find an example (other than the two given above) using the construction in (b). 


An integer n is abundant if the sum of its proper divisors is > n, for example 
n = 12; and n is deficient if the sum of its proper divisors is < n, for example n = 8. 
Each positive integer is either deficient, perfect, or abundant, a classification that 
goes back to antiquity] 


Exercise 4.3.15. (a) Prove that every prime number is deficient. 
(b) Prove that every multiple of 6 is abundant. 
) How do these concepts relate to the value of o(n)/n? 


(d) Prove that every multiple of an abundant number is abundant. 
(e)t Prove that if n is the product of k distinct primes that are each > k, then n is deficient. 
(f) Prove that every divisor of a deficient number is deficient. 


?The 14th-century scholar Ibn Khladun claimed: “Experts on talismans assure me that these 
numbers have a special influence in establishing strong bonds of friendship between individuals ... A 
bond so close that they cannot be separated. The author of the Ghaia, and other great masters in this 
art, swear that they have seen this happen again and again.” 

’'This was rediscovered by Descartes in the 17th century. 

* Specifically a book by Nichomachus from A.D. 100. Another interesting reference is the 10th- 
century German nun Hrotsvitha who depicts the heroine of her play “Sapientia” challenging Emperor 
Hadrian while he is persecuting Christians, to surmise the ages of her children from information about 
this classification and the number of Olympic games that each has been alive for! 


Amicable, abundant, and deficient integers 133 


Carl André is a controversial minimalist artist, his most infamous work being his 
Equivalent I-VIII series exhibited at several of the world’s leading museums. Each 
of the eight sculptures involves 120 bricks arranged in a different rectangular for- 
mation. In Equivalent VIII, at the Tate Modern in London, the bricks are stacked 2 
deep, 6 wide, and 10 long. (See http://thesingleroad.blogspot.co.uk/2011/01/test- 
post.html for a photo of the original eight formations.) 


Exercise 4.3.16. (a) How many different 2-deep, 120-brick, rectangular formations are there? 
(b) What if there must be at least three bricks along the width and along the length? 


Appendix 4A. More 
multiplicative functions 


4.4. Summing multiplicative functions 


We have already seen that the functions 1, n, ¢(n), a(n), and r(n) are all multi- 
plicative. In Proposition [4.1.1] we saw the surprising connection that n is the sum 
of the multiplicative function ¢(d), summed over the divisors d of n. Similarly r(n) 
is the sum of 1, and o(n) is the sum of d, summed over the divisors d of n. This 
suggests that there might be a general such phenomenon. 


Theorem 4.3. For any given multiplicative function f, the function 
F(n) => f(@) 
d|n 


is also multiplicative. 


Proof. Suppose that n = ab with (a,b) = 1. In exercise [4.2.2] we showed that the 
divisors of n can be written as £m where £|a and m|b. Note that (¢,m) = 1 since 
(£,m) divides (a,b) = 1 and so f(€m) = f(£)f(m). Therefore 


F(ab) =) f(d) = Do f(t) = D7 FO Df (m) = FO) FO), 
dlab Lla Lla mb 
m|b 


as desired. 


It is worth noting that if we write m = n/d, then Theorem [4.3] becomes 
F(n):= 5° f(n/m). 


min 


Above we have the examples {F'(n), f(d)} = {n, o(d)}, {r(n), 1}, {o(n), d}; 
but what about for other F(n)? For F(n) = 1 we have 1 = }7j,,6(d) where 


134 


4.5. Inclusion-exclusion and the Mobius function 135 


d(d) = 1 if d = 1, and = 0 otherwise. Finding f when F(n) = d(n) looks more 
complicated. This leads us to two questions: For every multiplicative F’, does there 
exist a multiplicative f for which F'(n) := Doile f(d)? And, if so, is f unique? To 
answer these questions we begin by defining another multiplicative function which 
arises in a rather different context. 


Exercise 4.4.1. Prove that =? ~~ odin 


Th ve il 
(mn) $(d)* 


d squarefree 
4.5. Inclusion-exclusion and the Mobius function 


In the proof of Theorem [4.1] we saw that if n = p* is a prime power, then ¢(n) is 
the total number of integers up to n, minus the number of those that are divisible 
by p. This leads to the formula 


ao) a-t (0-1) 


Similarly if n = p%q’, then we wish to count the number of positive integers up to 
n that are not divisible by either p or g. To do so we take the n integers up to n, 
subtract the n/p that are divisible by p and the n/q that are divisible by g. This 
is not quite right as we have twice subtracted the n/pq integers divisible by both p 
and q, and so we need to add them back in. This leads to the formula 


d(n) = ee es n(1-2) (1-<). 
P qd pq Pp q 


This argument generalizes to arbitrary n, though we need to keep track of the terms 
of the form +n/d. In our examples so far, we see that each such d is a divisor of 
n, but the term n/d only has a non-zero coefficient if d is squarefree. When d is 
squarefree the coefficient is given by (—1)“( where 


w(d) := S- 1 


p prime, p|d 


is the number of distinct prime factors of d. One therefore deduces that the coeffi- 
cient of n/d is always given by the Mobius function, u(d), a multiplicative function 
defined by 


u(p) = —1, with p(p*) = 0 for all k > 2, for every prime p. 


For example p(1) = 1, u(2) = w(3) = —1, w(4) = 0, w(6) = w(10) = 1, and 
(1001) = —1 as 1001 =7x 11 x 13. 


The argument for general n uses the inclusion-exclusion principle, which we 
formulate here to fit well with the topic of multiplicative functions. 


Corollary 4.5.1. We have 


5 y(a) = ifn=1, 
dln 0 


otherwise. 


Proof. The result for n = 1 is trivial. If n is a prime power p®* with a > 2, then 
Dalpe H(d) = 14+ (—1) + 04+---+0=0 by definition. 


The result for general n then follows from Theorem [4.3] 


136 Appendix 4A. More multiplicative functions 


Exercise 4.5.1. (a) Show that if m is squarefree, then 


Ge) = Fae), 


d|m 
(b) Deduce Corollary [4.5.1] 
A proof of Theorem [4.1] using the inclusion-exclusion principle. We want 


a function that counts 1 if (a,n) = 1 and 0 otherwise. This counting function can 
be given by Corollary 


S awe f if (a,n) = 1, 


dla & d\n 0 otherwise. 


Therefore 


ie) = ae if (a,n) = 1, 


0 otherwise. 
a=1 


3 Ss? (a) 


a=l1dla & d\n 


n d 
rae Ye Tula) 2 =n, 


l<a<n d|n d\n 
dla 


The last line comes from first swapping the order of summation and then using 
exercise [L.7.6[c) as [n/d] = n/d since each d divides n. Exercise completes 
the proof. 


Exercise 4.5.2. Prove that for any positive integer n we have 


eT = ALO 3): 


Exercise 4.5.3. Prove that y(n)? is the characteristic function for the squarefree integers, and 


n (a)? 
deduce that Cae Dain rICOn 


4.6. Convolutions and the Mobius inversion formula 
In the proof of Theorem [4.1] in the last section we saw that 


a(n) = So nla). 
dln 


If we let r = n/d, then the sum is over all factorizations of n into two positive 
integers n = dr, and so 


d,r>1 
n=dr 


This can be compared to Proposition [4.1.1] which yielded 


n=) 0) = DY) o@1(), 
d\n dr>1 
n=dr 


4.6. Convolutions and the Mobius inversion formula 137 


where 1(r) is the function that is always 1 (which is a multiplicative function). 
Something similar happens for the sum of any function f defined on the positive 
integers. 


Theorem 4.4 (The Mobius inversion formula). For any two arithmetic functions 
f and g we have 


= S- f(b) for all integers n> 1 


ab=n 


if and only if 
f(m) = s u(c)g(d) for all integers m > 1. 


cd=™ 


This can be rewritten as 
= S~ f(d) for all n > 1 if and only if f(m) = S~ u(m/d)g(d) for all m > 1. 
d|m 
Proof. If f(m) = do og-m H(O)g(@) for all integers m > 1, then 
Yo FH = SOY wOg@ = SF ul = 20) - So uc) = a(n), 
ab=n ab=n cd=b acd=n ac=n/d 


since this last sum is 0 unless n/d = 1, that is, unless d = n. Similarly if g(n) = 
Yo ap_n f (0) for all integers n > 1, then 


S> uog(d) = Y= ule) SS FO) = SO uu => f(b) S> wo) = f(m), 


cd=m cd=™ ab=d abc=m blm ac=* 


as desired. 


In the discussion above we saw several examples of the convolution f * g of two 
multiplicative functions f and g, which we define by 


(fxg)(n):= > flag 


ab=n 


Note that f * g =g* f. We saw that if I(n) =n, then 6* 1 =J and w*I = ¢, as 
well as 1* ps = 0. 


Exercise 4.6.1. Prove that 6 *« f = f for all f, 7 =1*1, and o(n) =1* I. 


Proposition 4.6.1. For any two multiplicative functions f and g, the convolution 
f xg ts also multiplicative. 
Exercise 4.6.2. Prove that if ab = mn, then there exist integers r,s,t,u with a = rs, b = 


tu, m=rt, n= su with (s,t) =1. 


Proof. Suppose that (m,n) = 1. For h = f * g, we have 


S> f(a)g(b) 


ab=mn 


138 Appendix 4A. More multiplicative functions 


We use exercise and note that (r,s) and (t, wu) both divide (m,n) = 1 and 
so both equal 1. Therefore f(a) = f(rs) = f(r) f(s) and g(b) = g(tu) = g(t)g(u). 
This implies that 


h(mn) = D> f(rs)g(tu) = SO f(r)g(t) YF F(s)g(u) = h(m)h(n). 


In this new language, Theorem [4.3] which states that 1 * f is multiplicative 
whenever f is, is the special case g = 1 of Proposition [4.6.1] Corollary [4.5.1] states 
that 1 * 4 = 6. The Mobius inversion formula states that f = 1 f if and only if 
f =u. It is also easy to prove the M6bius inversion formula with this notation 
since if F = 1x* f, then ux FPF =pxlx f =dx* f = f; andif f = wx F, then 
le fH=lxepxP=do*xF=F, 


Exercise 4.6.3. Prove that (wu *o)(n) =n for all integers n > 1. 


Exercise 4.6.4. (a) Show that (a* f)+(b* f) = (a+b) x f. 
(b) Let f(n) > 0 for all integers n > 1. Prove that (1* f)(n)+(y* f)(n) > 2f(n) for all integers 
ne 1. 
(c) Prove that o(n) + ¢(n) > 2n for all integers n > 1. 


Exercise 4.6.5. Suppose that g(n) = [T4), f(a). Deduce that f(n) = Tlajn 9(aynr/®. 


4.7. The Liouville function 


The number of prime factors of a given integer n = ae p;’ can be interpreted in 
two different ways: 


w(n) = > 1 = #{distinct primes that divide n} =k 
pin 


and 
k 
Q(n) := S- 1 = #{distinct prime powers that divide n} = S- ej. 
p prime, k>1 i=l 
p*|n 

In other words, Q(n) counts the number of primes when one factors n into primes 
without using exponents, so (12) = 3 as 12 = 2 x 2 x 3, while w(n) counts the 
number of primes when one factors n into primes using exponents, so w(12) = 2 as 
12 = 27.3. For other examples, w(27) = 1 with (27) = 3, and w(36) = 2 with 
Q(36) = 4, while Q(105) = w(105) = 3. 

Another interesting multiplicative function is Liouville’s function, defined at 
the start of this chapter by A(n) = (—1)°™ so that, for example, \(12) = (—1)? = 
—1. We notice that A is the totally multiplicative function that agrees with p on 
the squarefree integers. Liouville’s function feels, intuitively, more natural, but 
Mobius’s function fits better with the theory. 


Exercise 4.7.1. Prove that Q(n) > w(n) for all integers n > 1, with equality if and only if n is 
squarefree. 


Exercise 4.7.2. Prove that \* 2 = 6. 


Questions on the Liouville function and cancellation 139 


Exercise 4.7.3. (a) Prove that 


SIX) = {i if n is a square, 


0 otherwise. 
d|n 


(b)t By summing the formula in (a) over all positive integers n < N, deduce that for all integers 


N > 1 we have 
> An) *] = [VN]. 


n>1 


Additional exercises 


Exercise 4.8.1. Prove that u(n)u(n + 1)pu(n + 2)u(n + 3) = 0 for all integers n > 1. 
Exercise 4.8.2. Prove that ¢(n) + o(n) = 2n if and only if n = 1 or n is a prime. 


Exercise 4.8.3. (a) By summing the formula in Corollary [4.5.1] over all positive integers n < 


N, deduce that 
N 

> p(n) =] =1 forall N>1. 
n 


(b)* Deduce that 


> WO) 24 top alt > 1, 
n<N mt 


It is a much deeper problem to prove that >>, <j p(n)/n tends to a limit as N — oo. 


Exercise 4.8.4. (a) Prove that if f is an arithmetic function, then 


YF) AS = Ta Amo", 
n>1 m>1 


without worrying about convergence. 
(b) Write out explicitly the example f = uu as well as some other common multiplicative func- 
tions. 


Appendix 4B. Dirichlet series 
and multiplicative functions 


4.9. Dirichlet series 


In section [4.2] we started looking at finite sums of multiplicative functions. Now 
we study the (infinite) sum of a given multiplicative function g(n), summing over 
all positive integers n. The Fundamental Theorem of Arithmetic gives each posi- 
tive integer n a unique factorization NN pe, and this allows us to factor g(n) as 
I], 9(""). Proceeding as in the proof of Theorem [4.3] and putting convergence 
issues to one side for now, we obtain 


Sion) = J] G+9@)+9(v?)+---). 


n>1 p prime 


The product over the primes is known as the Euler product. In all of the examples 
of multiplicative functions given above, both sides of this equation diverge. However 
one can take g(n) = f(n)/n* for well-chosen s, so that convergence is no longer a 
problem. (Note that if f(n) is multiplicative, then f(n)/n* is multiplicative for any 
s € C by exercise [4.0.3]) In this case we define the Dirichlet series 


F(s) = 71) = II +...) 


8 
n>1 p prime P 


which is only really well-defined for complex numbers s for which the infinite sum 
is absolutely convergent || If f is totally multiplicative, then the sum for each p is 


5One can think of a Dirichlet series in two ways: first as a function in its own right, in which case 
questions about convergence are important (since the function needs to be well-defined), and second as 
a formal sum which allows us to work directly with the values of f(n) so as to focus on properties of 
f(m) and we do not consider Fs) as s varies. Such formal manipulations of Fs) connect to a lot of 
combinatorial ideas just as generating functions allow us to better understand sequences. 


140 


4.10. Multiplication of Dirichlet series 141 


a geometric series and so 


(4.9.1) F(s) = = I on 


8 
n>1 p prime Pp 


If an infinite sum is absolutely convergent, then one can rearrange the terms in 
the sum with impunity. Now |n*| = n®°©) so if we let ¢ = Re(s), then 


ye = yo Wo 


For multiplicative functions of interest the |f(n)| are typically bounded by a con- 
stant times a power of n, which implies that there exists some real number og such 
that F'(s) is absolutely convergent precisely when Re(s) > oo or > ao. (We will 
work out the details in the special case where each f(n) = 1 in section of 
appendix 5B.) 


Exercise 4.9.1.1 (a) Prove that if there exists a constant c > 0 and 7 € R such that |f(n)| < 

cn™ for all integers n > 1, then F(s) is absolutely convergent for s = 0+ it witho >1+4 7. 

(b) Prove that if F(s) is absolutely convergent for Re(s) = o, then there exists a constant c > 0 
such that |f(n)| < cn® for all integers n > 1. 


Some interesting examples of Dirichlet series include the Riemann zeta-function 
; 1 ag es 
¢(s) = S- ae the Dirichlet series with f(n) =1 for each n > 1, 
n>1 
and its inverse 

é{a) = S- He) the Dirichlet series with f(n) = y(n) for each n > 1 

ns” —_ 
n>1 


We will prove that this really is the inverse of ¢(s) in the next section. 


4.10. Multiplication of Dirichlet series 


Given two multiplicative functions f and g, let h = f * g be the convolution of f 
and g, as defined in appendix 4A. If we multiply the associated Dirichlet series for 
f and g, say 


F(s) = a= 2. and H(s) = > MY, 


a 
a>1 b>1 n>1 


then, grouping together terms where ab = n, we have 


riety = Aaah —y° (3° sow) 
ab=n 


a,b>1 n>1 


= Vifea(ny + = VEO = as), 


n>1 n>1 


142 Appendix 4B. Dirichlet series and multiplicative functions 


Define 1(s) = 1, the Dirichlet series with nth coefficient 6(n). Corollary 
yields that ps * 1 = 6 and so 


n 
MO (5) =1 
n>1 
and so we justify our claim for the Dirichlet series of ¢(s)~!. On the other hand 
by considering the coefficient of 1/n* in the identity ¢(s) - ¢(s)~ = 1(s), we can 
deduce Corollary [4.5.1] The Mobius inversion formula can be restated as the more 
transparent equivalence 


G(s) = ¢(s)F(s) if and only if F(s) = ¢(s)~'G(s). 


Exercise 4.10.1. By studying the Euler products or otherwise, prove the following: 

(a) indi n/n = ¢(s — 1); 

(b) psi T(n)/n> = C(s)?; 

(c) Masi a(m)/n® = C(s)¢(s — 1) and 37,51 on(n)/n® = ¢(s)¢(s — k) for all integers k > 1; 
(d) Sasi w(n)?/n8 = ¢(s)/¢(28); 

(e) Vino (m)/n* = ¢(s — 1)/¢(s); 

(f) Yin>1 A(m)/n® = ¢(2s)/¢(s). 


Exercise 4.10.2. (a) Describe the identity in Proposition[4.1.1]in terms of Dirichlet series. 
(b) Reprove exercise[4.10.1{d) and (e) by multiplying through by denominators. 
(c) Give a formula for the coefficients of F'(s)¢(s) in terms of the values of the f(n). 
(d) Suppose that f(p) is totally multiplicative with f(2)=0, f(p) =1 if p=1 (mod 3), and 
f(p) = —1 if p = —-1 (mod 3). Describe the coefficient of 1/n* in F(s)¢(s) in terms of the 
prime power factors of n. 


4.11. Other Dirichlet series of interest 


Taking the logarithm of the Euler product for the Riemann zeta-function, 


yields, since — log(1 — t) has Taylor series t + 2 + e +--+ whenever |t| < 1, 


1 1 
logcls)=- > e(1-5)= Yo 
p prime P ea P 


taking t = 1/p* for every prime p. Differentiating we obtain 


G(s) _ > logp 
p prime ee 
m>1 


The right-hand side is a Dirichlet series in which the only non-zero Dirichlet coeffi- 
cients occur for 1/n* where n is a prime power. This provides us with a completely 
different way to recognize primes from any that we have seen before, and it was 
the genius of Riemann to see that this observation could be manipulated into a 
technique for counting primes. For this reason we rewrite this as 


(8s) _ yn AM 
C(s) 2 ns 


n>1 


4.11. Other Dirichlet series of interest 143 


where we define the von Mangoldt function as 


Mia ta ifn=p 


™ a prime power, 


0 otherwise. 
We say that A is supported on the prime powers, since it is only when n is a prime 
power that A(n) can be non-zero. 


Writing logn as a sum of log p over all prime powers p* dividing n, we obtain 


the identity 
logn = S- A(d). 
d|n 
This can be rewritten as log = 1 * A and so this identity can also be obtained by 
taking the coefficient of 1/n* in —¢’(s) = ¢(s) - (—¢’(s)/¢(s)). If instead we write 


= 2G), 


¢(s) 
then the coefficient of 1/n* yields the identity 
(4.11.1) A(n) = S© p(d) log(n/d). 
d|n 


This identity relates the characteristic function for the prime numbers with the 
convolution of the multiplicative function ju(-) and the smooth function log(-). It 
can be used to prove the prime number theorem (see ), a subject which will 
be discussed in the next chapter. 

These ideas generalize very nicely to any multiplicative function f. The pth 
Euler factor of the Dirichlet series, F'(s), for f, is always invertible, since it is a 
power series in 1/p* with leading term 1: 

Exercise 4.11.1.1 (a) Suppose that a(t) = 1+ ait +agt? +---. Prove that there exists an 
inverse b(t) = 1+ bit + bot? +--+ for which a(t)b(t) = 1. 
(b) Deduce that (a + fy) + £@") +-- -), the pth Euler factor of F'(s), is invertible. 


Let G(s) = 7,51 9(n)/n* be the product over all primes p of these inverse Euler 
factors, and so F(s)G(s) = 1 which implies that f * g = 6. Now we define the 
coefficients, A,(n), of the logarithmic derivative of F' by 

Pils) > Ar(n) 

F(s) ns 


n>1 
Working with this series leads to lots of intriguing identities. 
Exercise 4.11.2. (a) Prove that Ay(n) + Ag(n) = 0 for all integers n > 1. 
(b) Use the identity —F’(s) = F(s) - (—F’(s)/F(s)) to prove that 
f(n)logn = > f(a)A¢(b) for all integers n > 1. 


ab=n 
(c) Use the identity —F’(s)/F(s) = G(s) - (—F’(s)) to prove that 
Ay(n) = s g(a) f(b) logb for all integers n > 1. 


ab=n 


(d) Deduce (4.11. i). 


Appendix 4C. Irreducible 
polynomials modulo p 


The Fundamental Theorem of Algebra (section of appendix 3F) states that 
every monic polynomial can be factored into monic irreducible polynomials in a 
unique way, which is completely analogous to how positive integers are factored into 
primes in the Fundamental Theorem of Arithmetic. In appendix 4B we associated 
the primes with ¢(s), the generating function for the integers. In this appendix we 
associate the irreducible monic polynomials mod p with the generating function for 
the monic polynomials mod p. 


4.12. Irreducible polynomials modulo p 


The monic polynomials mod p of degree d take the form 
eo ag ae? tt ee ao such that each a; € Z/pZ. 


(The notation Z/pZ was introduced in appendix 2A, to stand for the residue classes 
mod p.) There are p possible values for each a;, and there are d different a; to be 
selected, so there are a total of p monic polynomials mod p, of degree d. 

Exercise 4.12.1. Suppose that h(x) is a given polynomial mod p of degree d. Prove that there 


are exactly p™—¢4 
m> d. 


monic polynomials in F,[a] of degree m that are divisible by h(x), provided 


We will determine Ng, the number of monic irreducible polynomials mod p, of 
degree d. This is surprisingly straightforward using the Mobius inversion formula 
It is most elegant if we define the analogy to the von Mangoldt function by 


A(g(x)) deg p(x) if g(x) = p(x)* with p(x) irreducible and integer k > 1, 
x = 
2 0 otherwise. 


® Though there is no known analogous argument known for counting the primes themselves. 


144 


4.12. Irreducible polynomials modulo p 145 


We are going to compute the degree of the product of all of the monic polynomials 
mod p, of degree m, in two different ways: Firstly, since there are p™ such polyno- 
mials and since they each have degree m, the total degree is mp™. On the other 
hand, the degree of each such polynomial h(a) equals the sum of the degrees of all 
of its irreducible monic factors (counting each factor the number of times it occurs 
in the factorization); that is, 

deg h(a) = S- deg p(x) 


p(a) monic irreducible 
k>1, p(#)* divides h(a) 


= dX A(g(2)). 
g(x) monic 
g(x) divides h(x) 


Therefore, summing this over all polynomials h(a) of degree m we obtain 


mp™ = 7 degh(x)= S>  A(g(x)) S- 1. 


h(a) monic of degree m g(x) monic h(x) monic of degree m 
g(x) divides h(x) 


If g(a) has degree d, then the last term is 0 unless d < m, in which case exercise 


[4.12.1] yields that 
mp™ = So p™4 x A(g(2)). 
d=1 g(x) monic of degree d 


Subtracting p times this identity for m— 1 from this identity for m, we obtain 


(4.12.1) S- A(g(x)) = p™. 


g(x) monic of degree m 


The terms on the left-hand side are given by g(a) = p(a)* for each factorization 
m = dk as p(x) runs over the irreducible polynomials of degree d. Therefore 


(4.12.2) > aNa=P", 
d|m 


and then, by the Mébius inversion formula (given in section [4.6] of appendix 4A, 
with f(d) = dNq), we deduce that 


(4.12.3) mNm= >_> ula)p = >— u(d)p™4. 
d| 


ab=m 


This is an exact formula for the number of irreducible polynomials of degree m. 
The largest term in the sum on the right side comes from d = 1, the term being 
p™, so the number of irreducible polynomials of degree m is roughly p/m. The 
second largest term has d = 2 (when m is even), so equals —p™/? and otherwise is 
smaller. Therefore 
MN —P™| << DD pe ST ph < aplmsal. 
d|m, dA1 k<[m/2] 

One can then deduce that N,, > 0 for all prime powers p”; that is, there exists 
an irreducible polynomial mod p of every degree > 1. This is useful since, as will 


146 Appendix 4C. Irreducible polynomials modulo p 


be explained in section of appendix 7E, an irreducible polynomial mod p of 
degree n can be used to construct a finite field with p” elements. 


An alternative approach uses generating functions. We define 


F(t) = S- deg h(x) = S- s ym = S- pre ay 7 =. 


h(a) monic m>1 h(a) monic of degree m m>1 
On the other hand 
deg h(a) — II poea(p(2)*) — II pk dee(p(x)) 
p(x) monic irreducible p(x) monic irreducible 
k21, p(x)" ||h(a) k>1, p(x)" ||h(a) 


and so, summing t?°8"() over all monic polynomials h(x), we obtain (using the 
Fundamental Theorem of Arithmetic for polynomials to ensure unique factoriza- 
tion) 
F() = II (1 4 gdea(p(e)) 4 42deg(p(e)) 4, .) 
p(x) monic irreducible 
=|] Il (1-24)? = [fa-#)™. 

d>1p(a) monic irreducible, degree d d>1 

We have therefore proved the remarkable identity 


F(t) =| —— = ]Ja-#-™ 


1—pt ea 


which we have obtained much like we obtained Dirichlet series and their Euler 
products in appendix 4B. Now we take the logarithmic derivative to obtain 


—F'(t) D 3 dNat*—} 


F(t) 1—pt 1—t4 * 


d>1 


Multiplying through by t and expanding we have 


d 
Door = P= Dave = aa Sor = aN | 


n>1 d>1 d>1 m>1 n>1 \d|n 


Comparing the coefficients of t” on both sides we again obtain the identity (4.12.2), 
which then leads to our exact formula for the number of irreducible polynomials of 
degree d. 


Exercise 4.12.2. Prove that Nm < p™/m. 


Appendix 4D. The harmonic 
sum and the divisor function 


The number of divisors of integers, T(n), varies considerably with n: The data 
begins 
r(1) =1, 7(2) = 2, 7(3) = 2, 7(4) = 3, (5) = 2, 7(6) =4, 7(7) = 2, 
7(8) = 4, 7(9) =3, 7(10) = 4, 7(11) = 2, 7(12) =6,..., 
and sampling a bit further along: 
7(101) = 2, 7(102) = 8, 7(103) = 2, 7(104) = 8, 7(105) = 8, 
7(106) = 4, 7(107) = 2, 7(108) = 12, 7(109) = 2,.... 

The values of r(n) bounce around as n grows, often a power of 2, without seeming 
to be close to some smooth function] On the other hand, it is not difficult to 


show that the “average” of r(n) up to x is well-approximated by a smooth, familiar 
function, namely log x: 


4.13. The average number of divisors 


The average of r(n) up to = is given by, using the definition of r(n), 


1 1 1 1 x 

Ly om == yy = 1 TE. 

x x x x 

l<n<a n<ad<au d<an<a d<a 
d|n d\n 

The first equality follows as T(n) = ogey. a|n 1, the second by swapping the order 
of summation, and the third by exercise[I.7.6] Now [x/d] is an awkward quantity to 
work with so we replace it by something easier to work with. From the inequalities 


“Often one wishes to say that a given complicated mathematical function “is well-approximated” 
by a function that is “smooth” and simple to describe. 


147 


148 Appendix 4D. The harmonic sum and the divisor function 


x/d—1 < [a/d] < x/d we therefore obtain 


1 1 1 
NG 1) ela a 7(n) s a 
d<a d<a 


l<n<a 


which simplifies to 
i 1 1 
(4.13.1) ya < yin) = ye 
d<a l<n<a d<a 


The upper and lower bounds here differ by just 1; it’s amazing that the average of 
T(n) can be nailed down so precisely, given how much r(n) varies. Next we need a 
smooth function that provides a good estimate for }°>)-,, 1/d as x varies. 


4.14. The harmonic sum 


We wish to obtain good approximations to the value of the harmonic sums 

Tee 1 
as N grows. As an example we now describe L,4 in terms of the area of the shaded 
region in the graph below. 


1/2 


1/3 
1/4 


Figure 4.1. The harmonic sum, L4, presented as an area, bounded below by 
the curve y = 1/2. 


The first shaded rectangle, with corners at (1,0), (2,0), (2,1), and (1,1), has area 
+ the next with corners at (2,0), (3,0), (3, 5); and (2, 3): has area $, etc. So the 
shaded region has area Ly = + + 4 + 3 + i The shaded region contains the curve 
y = 1/a for x in the range 1 < x < 5, and so the shaded region has area which is 


4.14. The harmonic sum 149 


larger than the area under this curve. Precisely, L4 > i at 


generalizes to obtain that, for any integer N > 1, 


N+1 gq N+1 g 
Ly = / = > / —— log(N +1) > log N. 
1 [x] 1 v 


To obtain an upper bound on Ly we shift our boxes one to the left. 


= log 5. This argument 


1/2 


1/3 
1/4 


Figure 4.2. The harmonic sum, L4, presented as an area, bounded above by 
the curve y = 1/2. 


Now the area under the curve y = 1/z for x in the range 0 < x < 4 contains the 
shaded region and so provides an upper bound for £4. However the area under 
y = 1/a for x in the range 0 < x < 1 is ov, so this does not yield a useful upper 
bound on L,4. The trick is to treat the first term in the sum, 1, separately. Therefore 
Ing = : + 5 7 + t S ; + he at = 1+ log4, which is not much bigger than log 5. 
In general, as 1/[a% + 1] < 1/2 for all « > 0, we have 


N N 
dx dx 
In = 1+ < 1+ — = 1lslogN. 
am / [z+ 1] / z ° 


Therefore log N is a very accurate approximation to Ly (as 0 < Ly —logN <1 
for all N) but we can and will do better! Inserting these bounds for the harmonic 
sum into (413-1) we obtain 


1 
logr-1 < — > T(n) < logr+1, 
l<n<a 
for any number x > 1. Therefore the mean value of 7(n), over all n < a, is logx 
with an error of at most 1, a far better estimate than we might have guessed was 
feasible given how this section started. We will obtain even better bounds in section 


[4.15] of appendix 4D. 


150 Appendix 4D. The harmonic sum and the divisor function 


In the next exercise we will prove that Ly — log N tends to a limit as N goes 
to infinity; that is, 
1 1 1 
li pow log N 
Ne € 9 a ee ) 
exists. This limit is usually denoted by y and is called the Euler-Mascheroni con- 
stant. It can be shown that 7 = .5772156649... 


Exercise 4.14.1. Let a, = 1/141/24+1/34+---+1/n-—logn for each integer n > 1. 
(a) By modifying the argument above, show that if n > m, then 
0< am — &p < log(1+1/m) — log(1+1/n) < 1/m. 
Therefore (%n)n>1 is a Cauchy sequence and converges to a limit, as desired. 
(b) Deduce that O<¢+4+-:-+%—-logN-y< 4. 
(c)* Prove that 


Inserting this estimate into (4.13.1) does not give a much better approximation 
for the average of T(n); the problem is that there is a difference of 1 between the 
two sides of (4.13.1), which we will improve in the next section. 

In the next exercise we obtain an analogous estimate for N!. 

Exercise 4.14.2.' Notice that log is a monotone increasing function for « > 1. Assume that 
N>M. 
(a) Justify the lower bound logn > f”_, logt dt and deduce that N!/M! > (N/e)% /(M/e)™. 
(b) Prove an analogous upper bound on logn and deduce that N!/M! < (N/e)N+1/(M/e)™“+1. 


(c) Deduce that 1//N < nif (2)" ven < VN. 


4.15. Dirichlet’s hyperbola trick 


In the argument in section [£13] of appendix 4D we noted that d is a divisor of 
exactly [a/d] integers up to x and then approximated this using the inequalities 
t/d—1 < |x#/d] < «/d. This difference of 1 between the upper and lower bounds 
gives rise to the difference 1 between the upper and lower bounds in (4.13.1). The 
difference 1 is negligible compared to the “main term” 2/d when «/d is large but 
seems more relevant when a/d is small, that is, if d is close to x. For example, if 
t/2<d< a, the actual value of [%/d] is 1, and our bounds differ by that much, a 
source of much of the error term in our approximation to the average. So can we 
somehow avoid this source of most of the error in our summation technique? 


Dirichlet observed if dln and d is close to n, then its complementary divisor, 
n/d, rust be small. Inspired by this observation, Dirichlet counted the number of 
divisors using only the smaller number of each pair. Therefore, writing n = ab, we 


have 
rm = i= 1+ En 
a,b>1 b>a>1 a>b>1 
ab=n ab=n ab=n 


8One might reasonably ask whether y can be described other than as this limit? Although there 
are other descriptions of 7 (usually as an infinite integral, as in exercise [4.14.1Ic)), none are easy to 
work with, so we do not even know whether 7 is rational, a longstanding open question. However 7 is a 
fundamental constant appearing in all sorts of important settings in number theory, so we would dearly 
like to understand it better. 


4.15. Dirichlet’s hyperbola trick 151 


so that 
1 1 1 
2 TOS | UP 
l<n<a b>a>1 a>b>1 
ab<a ab<a 
1 1 
-?Y Yr+ey Yt 
l<a<Ve ax<b<a/a 1<b<Va b<a<a/b 
1 1 
== S> ([e/a]+1—a) + = S~ ([x/b] - 8). 
1<a<Ve 1<b< Vz 


Now that a (and b) is < \/z the approximation x/a for [x/a] is far better. Writing 
N = [\/z] and using the upper bound [a/c] < x/c for c = a and b, we obtain 


1l<n<a 1<a<N 1<b<N 
1 1 
<2 a 2d—1 
£2 >, go 5.2. PtH 
1l<d<N l<d<N 


by exercise[4.14.1{b). Using the lower bound [a/c] > 2/c—1 the analogous argument 
yields 


e > r(n) >2 (log +4 x) (N- 2)? 


x x 
l<n<a 


Therefore, if x is sufficiently large, then 


1 
= > r(n) = (loge +27-1)] < 2, 
Pinks ve 


a tiny error term. Getting the main term, log + 2y — 1, was not difficult, but 
getting the upper bound on the error term is not so easy!] How much can the 
bounds on the error term be improved? Calculations suggest bounds like < ¢/x?/3 
or even better. What is the best possible? Can we get a good lower bound on the 
error term? These are fundamental questions (for this and other problems) that 
intrigue analytic number theorists. 


People like Euler and Gauss were not as interested in the explicit average, as 
much as determining what the summed function grows like. Thus they would write 
something like 


T(n) grows, on average, like logn + 2y. 


° Although both the upper and lower bounds are obviously at most some multiple of 1/,\/z, getting 
a sharp bound on that multiple is not easy. And who really cares? Typically in these situations we aim 
to obtain a multiple of 1/,/z that is not difficult to prove rather than aim for a multiple that takes a 
lot of work to establish yet knowledge of this improvement adds little to our understanding. 


152 Appendix 4D. The harmonic sum and the divisor function 


To see that this is true, we observe that exercise |4.14.2] implies, after the last 
displayed equation, 


S- (r(n) —logn — 2y)| < 52. 


In the final exercise we show how to obtain an even better estimate for N!. 


Exercise 4.15.1.4 Let My :=log1+log2+---+logN and ay := My —(N+ +) (log N —1). 
(a) Prove that there exists a constant c; > 0 such that for all integers n > 1 we have 


n+$ 
Z logt dt — logn 
n 


=i 
2 


C1 
<<. 
n2 


(b) Deduce that if M > N, then |xag — xn| < c2/N for some constant cz > 0. 
(c) Prove that (y)n>1 tends to a limit. 
(d) Deduce that there exists a constant co > 0 such that 


N N 
lim. nif (“) VcoN = 1. 
N-oo e 


Stirling showed that co = 27 (this amazingly accurate approximation to N! is known as 
Stirling’s formula). 


Appendix 4E. Cyclotomic 
polynomials 


We give here one final application of the Mobius inversion formula to construct a 
family of polynomials that are particularly important in number theory, as we will 
see in later sections. 


4.16. Cyclotomic polynomials 


Every non-zero complex number can be written in the form re?’"? where r is a 


positive real number and 6 € R/Z. If this is a root of «” — 1, then r™e?'7™"? = 1. 
Taking absolute values we see that r’™ = 1, and therefore r = 1. Since ere = Tat 
and only if r € Z, we have e?'"”° = 1 if and only if mé € Z, that is, 


0=O0or + or or... or % (in R/Z). 


The first of these yields the root 1; the second gives the root CG» = e7'"/™. The 
others are the powers of ¢,,; that is, 


1, Gm, €7,,---,¢™71 are the m distinct roots of 2” — 1. 


m? 


These are the mth roots of unity. 


Exercise 4.16.1. (a) Show that ¢¢, = en if and only ifi = 7 (mod m). 
(b) Prove that 2” —1= (a —1)(a — Gm)(a — C2,) ++ (x — cma), 
(c) Deduce that 2” — y™ = (a — y)(@— Gmy)-+: (@- Ce ays 
(d) Deduce that if m is odd, then 2” + y™ = (1+ y)(a@+ Gmy)-+: (a@+ Cima ly), 


If ais an mth root of unity, but not an rth root of unity for any r, 1 <r < m—1, 
then a is a primitive mth root of unity. The cyclotomic polynomials are defined as 


bm(x) = II (a - a). 


a a primitive 
mth root of unity 


153 


154 Appendix 4E. Cyclotomic polynomials 


Since every root of ¢m(x) is a root of 2 —1, we deduce that $m (a) divides «™ — 1. 
In fact dm (x) divides x” — 1 if and only if m divides n: If m divides n, this follows 
since dy,(x) divides x™ — 1, which divides x” — 1. In the other direction, if a 
is an nth root of unity, then a” = 1, and if a is a primitive mth root of unity, 
then a” = 1. Now select integers u,v such that um + vn = ged(m,n), and so 
aged(mn) — (qi™)¥(Q")” = 1. Now gcd(m,n) < m and therefore ged(m,n) = m by 
the minimality of m, and so m divides n. 

The polynomials ¢,,, (x) all have distinct roots and so JJ,,,;,, @m(x) divides x" —1. 
On the other hand the roots of «” — 1 are all nth roots of unity and hence are each 
a primitive mth root of unity for some m dividing n. Since x” — 1 has no repeated 
roots (or else the root would also be a root of its derivative, nx"~+), we deduce 


that 
ze’ -1l= II Pm (a). 


min 


Exercise 4.16.2. (a) Show that Daan deg ¢dm = n for all integers n > 1. 
(b) Deduce that ¢m(x) has degree ¢(m) for each m > 1. 


Exercise 4.16.3. (a) Show that if m divides n, then (¢)™ = 1 if and only if n/m divides k. 
(b) Deduce that if m divides n, then ¢* is a primitive mth root of unity if and only if there 
exists an integer r, coprime with m, for which k = (n/m)r. 
(c) Show that the set of primitive mth roots of unity is {, : 1<j< mand (j,m) = 1}. 
(d) Deduce that ¢m(x) has degree ¢(m). 


Exercise 4.16.4. By using the Mobius inversion formula, prove that 
gom(t) = II (t¢ — 1)H0m/4) 
d|m 
Exercise 4.16.5. (a) Prove that TLinjn m>1em(1) =n. 
(b)* Deduce that if m > 1, then log ¢m(1) = A(m), the von Mangoldt function of appendix 4B. 


Chapter 5 


The distribution 
of prime numbers 


Once one begins to determine which integers are primes, one quickly finds that 
there are many. Are there infinitely many? One notices that the primes seem to 
make up a decreasing proportion of the positive integers. Can we explain this? Can 
we give a formula for how many primes there are up to a given point? Or at least 
give a good estimate? 


When we write out the primes there seem to be patterns, though the patterns 
rarely persist for long. Can we find patterns that do persist? Is there a formula 
that describes all of the primes? Or at least some of them? 

Is it possible to recognize prime numbers quickly and easily? 


These questions motivate different parts of this chapter and of chapter 10. 


5.1. Proofs that there are infinitely many primes 


The first known proof appears in Euclid’s Elements, Book IX, Proposition 20: 


Theorem 5.1. There are infinitely many primes. 


Proof #11] Suppose that there are only finitely many primes, which we will denote 
by 2=p, < pg =3<:--+ < px. What are the prime factors of pypo... pz, +1? Since 
this number is > 1 it must have a prime factor by the Fundamental Theorem of 
Arithmetic, and this must be p; for some 7, 1 < j < k, since all primes are contained 
amongst p1,p2,...,Pk- But then p; divides both pip2...pz and pip2...pe+1, and 
hence p; divides their difference, 1, by exercise [.LiI{c), which is impossible. 


1Not until relatively recently has there been mathematical notation to describe a collection of 
objects, for example, p1,p2,-...,px. Neither Euclid nor Fermat had subscripts or “...” or “etc.” (Gauss 
used “&c”). So instead the reader had to infer from the context how many objects the author meant. In 
Euclid’s Elements, he writes that he assumes a, 3, y denote all of the prime numbers and then gives, in 
terms of ideas, the same proof as here. The reader had to understand that in writing “a, 8, y”, Euclid 
meant an arbitrary number of primes, not just three! 


155 


156 5. The distribution of prime numbers 


There are many variants on Euclid’s proof. For example: 


Exercise 5.1.1 (Proof #2). Suppose that there are only finitely many primes, the largest of 
which is n > 2. Show that this is impossible by considering the prime factors of n! — 1. 


Other variants include Furstenberg’s curious proof using point-set topology (see 
appendix 5F). These all boil down to showing that there exists an integer q > 1 that 
is not divisible by any of a given finite set of primes p1,...,px. If m = pip2--: pr, 
then we wish to show there exists an integer g > 1 with (g,m) = 1, and there are 
@(m) — 1 such integers up to m. There is therefore such an integer by the formula 
in Theorem [4.]] once m > 2. 


Exercise 5.1.2. Prove that there are infinitely many composite numbers. 


Euclid’s proof that there are infinitely many primes is a “proof by contradic- 
tion”, showing that it is impossible that there are finitely many, and so does not 
suggest how one might find infinitely many. We can use the following constructive 
technique to determine infinitely many primes: 


Lemma 5.1.1. Suppose that a, < ag < -:: is an infinite sequence of pairwise 
coprime positive integers, and let py, be a prime factor of a, for eachn > 2. Then 
2, P3,--. 1s an infinite sequence of distinct primes. 


Proof. If pm = pn with 1 <m <n, then p,, divides both a,,, and a, and so divides 
(Gm, Qn) = 1, which is impossible. 


By Lemmal|5.1.1}we need only find an infinite sequence of pairwise coprime pos- 
itive integers to obtain infinitely many primes. This can be achieved by modifying 
Euclid’s construction. We define the sequence 


a, = 2, dg =3 and then ay = aja2...dn_1 +1 for each n > 2. 


Now if m < n, then a, = 1 (mod ay) and so (@m,@n) = (Gm, 1) = 1 by exercise 
[2.1.5] as desired. Therefore, by Lemma b.1.1] we can take a prime factor p,, of each 
Gy, with n > 1 to obtain an infinite sequence of prime numbers. 


Fermat conjectured that the integers F,, = 2?” +1 are primes for all n > 0. 
His claim starts off correct: 3,5, 17,257, 65537 are all prime, but his conjecture is 
false for Fs = 641 x 6700417, as Euler famously noted. It_is an open question as to 
whether there are infinitely many primes of the form F2 Using the identity 


(5.1.1) F, = F\ Fo... Fn—1 + 2 for each n > 1 


we see that ifm <n, then F,, = 2 (mod F,,) so that (Fin, Fn) = (Fm, 2) = 1, the 
last equality since each F,, is odd. Therefore, by LemmaJ5.1.1] we can take a prime 
factor py, of each F,, to obtain an infinite sequence of prime numbers|* 


These proofs that there are infinitely many primes will be generalized using 
dynamical systems in appendix 5H. 


?The only Fermat numbers known to be primes have n < 4. We know that the F,, are composite 
for 5 < n < 30 and for many other n besides. It is always a significant moment when a Fermat number 
is factored for the first time. It could be that all F, with n > 4 are composite or they might all be 
prime from some sufficiently large n onwards or some might be prime and some composite. Currently, 
we have no way of knowing which is true. 

°This proof that there are infinitely many primes first appeared in a letter from Goldbach to Euler 
in July 1730. 


5.2. Distinguishing primes 157 


Exercise 5.1.3. Prove (5.1.J). 


Exercise 5.1.4. Suppose that pi = 2 < po =3<--- is the sequence of prime numbers. Use the 
fact that every Fermat number has a distinct prime divisor to prove that pn < 2?” +1. What can 
one deduce about the number of primes up to x? 


Exercise 5.1.5. (a) Show that if m is not a power of 2, then 2” + 1 is composite by showing 
that 2% + 1 divides 2° + 1 whenever b is odd. 
(b) Deduce that if 2” +1 is prime, then there exists an integer n such that m = 2”; that is, if 


2™ +1 is prime, then it is a Fermat number Fy, = 22" 4.4, (This also follows from exercise 
3.9.3 b).) 


Another interesting sequence is the Mersenne numbers[4| which take the form 
M, = 2” —1. After exercise [3.9.2] we observed that if M,, is prime, then n is 
prime and, in our discussion of perfect numbers (section [4.2) we observed that 
M2, M3, Ms, and M7 are each prime but MM), = 23 x 89 is not. The Lucas-Lehmer 
test provides a relatively quick and elegant way to test whether a given M, is prime 
(see Corollary [10.10.1in appendix 10C). 


5.2. Distinguishing primes 


We can determine whether a given integer n is prime in practice, by proving that 
it is not composite: If a given integer n is composite, then we can write it as ab, 
two integers both > 1. If we suppose that a < b, then a? < ab =n and so a < V/n. 
Hence n must be divisible by some integer a in the range 1 < a < \/n. Therefore 
we can test divide n by every integer a in this range, and we either discover a factor 
of n or, if not, we know that n must be prime. This process is called trial division 
and is too slow, in practice, except for relatively small integers n. We can slightly 
improve this algorithm by noting that if p is a prime dividing a, then p divides n, 
so we only need to test divide by the primes up to \/n. This is still very slow, in 
practice|? We discuss more practical techniques in chapter 10. 


Trial division is a very slow way of recognizing whether an individual integer is 
prime, but it can be organized to be a highly efficient way to determine all of the 
primes up to some given point, as observed by Eratosthenes around 200 B.C 


The sieve of Eratosthenes provides an efficient method for finding all of the 
primes up to x. For example to find all the primes up to 100, we begin by writing 
down every integer between 2 and 100 and then deleting every composite even 
number; that is, one deletes (or sieves out) every second integer up to x after 2. 


4In 1640, France was home to the great philosophers and mathematicians of the age, such as 
Descartes, Desargues, Fermat, and Pascal. From 1630 on, Father Marin Mersenne wrote letters to all of 
these luminaries, posting challenges and persuading them all to think about perfect numbers. 

5 How slow is “slow”? If we could test divide by one prime per second, for a year, with no rest, 
then we could determine the primality of 17-digit numbers but not 18-digit numbers. If we used the 
world’s fastest computer in 2019, we could test divide 53-, but not 54-, digit numbers. In chapter 10 we 
will encounter much better methods that can test such a number for primality, in moments. 

°Eratosthenes lived in Cyrene in ancient Greece, from 276 to 195 B.C. He created the grid system 
of latitude and longitude to draw an accurate map of the world incorporating parallels and meridians. 
He was the first to calculate the circumference of the earth, the tilt of the earth’s axis, and the distance 
from the earth to the sun (and so invented the leap day). He even attempted to assign dates to what 
was then ancient history (like the conquest of Troy) using available evidence. 


158 5. The distribution of prime numbers 


2 3 5 7 9 
11 13 «150 «617~—S 19 
21 23 250 (27 -~——29 
3l 33 385887839 
Al 43 45 47 49 
51 53 55 «0557 PD 
61 63 65 67 69 
71 73 75 77 79 
81 83 85 87 89 
91 93 95 97 99 


Deleting every even number > 2, between 2 and 100 


The first undeleted integer > 2 is 3; one then deletes every composite integer 
divisible by 3; that is, one sieves out every third integer up to x after 3. The next 
undeleted integer is 5 and one sieves out every fifth integer after 5, and then every 
seventh integer after 7. 


2 83 5 7 2 3 #5 7 
11 13 17 19 11 13 17 19 
23-25 29 23 29 
31 35 37 31 37 
Al 43 47 49 Al 43 47 
5355 59 53 59 
61 65 67 61 67 
fall 73 77 ~=©°79 71 73 79 
83 85 89 83 89 
91 95 97 97 
Then delete remaining integers > 3 and > 5 that are divisible by 5 
that are divisible by 3 and > 7 that are divisible by 7. 


The sieve of Eratosthenes enables us to find all of the primes up to 100. 


What’s left are the primes up to 100. To obtain the primes up to any given limit 
x, one keeps on going like this, finding the next undeleted integer, call it p, which 
must be prime since it is undeleted, and then deleting every pth integer beyond p 
and up to x. We stop once p > \/z and then the undeleted integers are the primes 
< «a. There are about x log log x stepd]| in this algorithm, so it is a remarkably 
efficient way to find all the primes up to some given zi but not for finding any 
particular prime. 


Exercise 5.2.1. Use this method to find all of the primes up to 200. 


The number of integers left after one removes the multiples of 2 is roughly 4 ‘2, 
since about half of the integers up to x are divisible by 2. After one then removes 


“How should one think about an expression like loglogz? It goes to co as a does, but it is a 
very slow growing function of a. For example, if 2 = 101°°, far more than the current estimate for the 
number of atoms in the universe, then loglogz < 5S. Dan Shanks once wrote that “loglog x goes to 
infinity with great dignity.” 

5In practice, this algorithm determines which of the first x integers are prime in no more than 6x 
steps. 


5.3. Primes in certain arithmetic progressions 159 


the multiples of 3, one expects that there are about 2 - x integers left, since 
about a third of the odd integers up to x are divisible by 3. In general removing 
the multiples of p removes, we expect, about 1/p of the integers in our set and so 
leaves a proportion 1 — Therefore we expect that the number of integers left 


w 
NIB 


unsieved in the sieve of Eratosthenes, up to x, after sieving by the primes up to y, 
is about 


The product [],<,(1— a) i is well-approximated by e~7/ log y, where ¥y is the Euler- 
Mascheroni constant discussed | in section[4.14]of appendix 4p) The logarithm, used 
here and elsewhere in this book, is the natural logarithm. 

When we take y = \/z, then only 1 and the primes up to x should be left in the 
sieve of Eratosthenes, and so one might guess from this analysis of sieve methods 
that the number of primes up to x is approximately 


(5.2.1) a ae 


log x 


This guess is not correct; the constant is off [19 as we will discuss in section [5.4] 


5.3. Primes in certain arithmetic progressions 


How are the primes split between the arithmetic progressions modulo 3? Or modulo 
4? Or modulo any given integer m? Evidently every integer in the arithmetic 
progression 0 (mod 3) (that is, integers of the form 3k) is divisible by 3, so the 
only prime in that arithmetic progression is 3 itself. There are no such divisibility 
restrictions for the arithmetic progressions 1 (mod 3) and 2 (mod 3) and if we 
partition the primes up to 100 into these arithmetic progressions, we find: 


Primes = 1 (mod 3): 7,13, 19, 31,37, 43, 61, 67, 73, 79, 97,.... 
Primes = 2 (mod 3): 2,5, 11,17, 23, 29, 41, 47, 53, 59, 71, 83, 89,.... 


There seem to be a lot of primes in each arithmetic progression, and they seem to 
be roughly equally split between the two. Let’s see what we can prove. First let’s 
deal, in general, with the analogy to the case 0 (mod 3). This includes not only 0 
(mod m) but also cases like 2 (mod 4): 


Exercise 5.3.1. (a) Prove that any integer = a (mod m) is divisible by (a,m). 
(b) Deduce that if (a,m) > 1 and if there is a prime = a (mod m), then that prime is (a, m). 
(c) Give examples of arithmetic progressions which contain exactly one prime and examples 
which contain none. 
(d) Show that the arithmetic progression 2 (mod 6) contains infinitely many prime powers. 


Therefore all but finitely many primes are distributed among the ¢(m) arith- 
metic progressions a (mod m) with (a,m) = 1. How are they distributed? If the 
m = 3 case is anything to go by, it appears that there are infinitely many in each 


°This is a fact that is beyond the scope of this book but will be discussed in |Graa]|. In fact 
e 7 = .56145948 . 
10-Though ert by much. The correct constant is 1 whereas 2e~ 7% = 1.12291896.... 


160 5. The distribution of prime numbers 


such arithmetic progression, and maybe even roughly equal numbers of primes in 
each up to any given point. 


We will prove that there are infinitely many primes in each of the two feasible 
residue classes mod 3 (see Theorems f.2] and [7.7). 


Theorem 5.2. There are infinitely many primes = —1 (mod 3). 


Proof. Suppose that there are only finitely many primes = —1 (mod 3), say 
P1,P2,---,;Pk- The integer N = 3p,p2...px — 1 must have a prime factor q = —1 
(mod 3), by exercise [5.3.2] However g divides both N and N + 1 (since it must be 
one of the primes p;), and hence q divides their difference, 1, which is impossible. 


Exercise 5.3.2. Use exercise[3.1.4[a) to show that if m = —1 (mod 3), then there exists a prime 
factor p of n which is = —1 (mod 3). 


In 1837 Dirichlet showed that whenever (a,q) = 1 there are infinitely many 
primes = a (mod q). (We discuss this deep result in sections of appendix 
8D and [73.7]) In fact there are roughly equally many primes in each of these 
arithmetic progressions mod q. For example, half the primes are = 1 (mod 3) and 
half are = —1 (mod 3), as our data above suggested. Roughly 1% of the primes are 
= 69 (mod 101) and indeed there are roughly 1% of the primes in each arithmetic 
progression a mod 101 with 1 < a < 100. This is a deep result and will be discussed 
at length in our book [Graal]. 


Exercise 5.3.3. Prove that there are infinitely many primes = —1 (mod 4). 
Exercise 5.3.4. Prove that there are infinitely many primes = 5 (mod 6). 


Exercise 5.3.5.1 Prove that at least two of the arithmetic progressions mod 8 contain infinitely 
many primes. 


Exercise |5.8.6] generalizes these results considerably, using similar ideas. 


5.4. How many primes are there up to x? 


When people started to develop large tables of primes, perhaps looking for a pattern, 
they discovered no patterns but did find that the proportion of any that are 
prime is gradually diminishing (which will be proved in section of sppendi< 
5B). In 1808 Legendre quantified this, suggesting that there are an primes 


rs 


up to a4 A few years earlier, aged 15 or 16, Gauss had already made a much better 
guess, based on studying tables of primes: 


In 1792 or 1793 ... I turned my attention to the decreasing frequency 
of primes ... counting the primes in intervals of length 1000. I soon 
recognized that behind all of the fluctuations, this frequency is on average 
inversely proportional to the logarithm .... 

— from a letter to ENCKE by K. F. Gauss (Christmas Eve, 1849) 


11 And even the more precise assertion that there exists a constant B such that (a), the number 
of primes up to 2, is well-approximated by x/(log x — B) for large enough x. This turns out to be true 
with B = 1, though this was not the value for B suggested by Legendre (who presumably made a guess 
based on data for small values of x). 


5.4. How many primes are there up to x? 161 


His observation may be best phrased as 
About 1 in log x of the integers near x are prime, 


which is (subtly) different from Legendre’s assertion: Gauss’s observation suggests 
that a good approximation to the number of primes up to x is )>"_, oar a r 
does not vary much for ¢ between n and n+ 1, Gauss deduced that 7(a) should be 


well-approximated by 
"dt 
(5.4.1) Ee, 
2 logt 
We denote this quantity by Li(a) and call it the logarithmic integral The loga- 
rithm here is again the natural logarithm. Here is a comparison of Gauss’s predic- 


tion with the actual count of primes up to various values of zx: 


x a(x) = #{primes < x} Overcount: Li(a) — (x) 
10° 168 10 
104 1229 17 
10° 9592 38 
10° 78498 130 
10” 664579 339 
108 5761455 754 
10° 50847534 1701 
101° 455052511 3104 
101! 4118054813 11588 
10!” 37607912018 38263 
1013 346065536839 108971 
1014 3204941750802 314890 
10% 29844570422669 1052619 
10'6 279238341033925 3214632 
1017 2623557157654233 7956589 
1018 24739954287740860 21949555 
109 234057667276344607 99877775 
107° 2220819602560918840 222744644 
107! 21127269486018731928 597394254 
107? 201467286689315906290 1932355208 
1073 | 1925320391606803968923 7250186216 
1074 | 18435599767349200867866 17146907278 
107° | 176846309399143769411680 55160980939 


Primes up to various x and the overcount in Gauss’s prediction. 


Gauss’s prediction is amazingly accurate. From the data, Gauss’s prediction seems 
to overcount by a small amount, for all « > 84 To quantify this “small amount”, 
we observe that the last column (representing the overcount) is always about half 
the width of the central column (representing the number of primes up to x), so 
these data suggest that the difference is no bigger than a small multiple of /z. 


12Some authors begin the integral defining Li(z) at « = 0. This adds complication since the 
integrand equals co at x = 1; nonetheless this can be handled, and the difference between the two 


definitions is then the constant 1.045163..., which has little relevance to our discussion. 
131¢ is not true that Li(z) > (a) for all « > 2 but the first counterexample is far beyond where 


we can hope to calculate. Understanding how we know this is well beyond the scope of this book, but 


see [Graal. 


162 5. The distribution of prime numbers 


This might be optimistic but, at the very least, the ratio of (a), the number of 
primes up to x, to Gauss’s guess, Li(x), should tend to 1 as x — co; that is, 


n(x) / Lift) +1 as x00. 


In exercise [5.8.11] we show that Li(z) j ee — las x — co, and combining these 


last two limits, we deduce that 


(a) | Ge >1loas tO. 


The notation of limits is cumbersome; it is easier to write 


(5.4.2) n(x) 


x 


™ log x 


as x — 00, “z(x) is asymptotic to x/ log a” [4] This is different by a multiplicative 
constant from (6.2.1), our guesstimate based on the sieve of Eratosthenes. Our 
data makes it seem more likely that the constant 1 given here, rather than the 
2e~7 given in (5.2.1), is the correct constant. 


The asymptotic (5.4.2) is called the prime number theorem. Its proof came 
in 1896, more than 100 years after Gauss’s guess, involving several remarkable 
developments. It was a high point of 19th-century mathematics and there is still no 
straightforward approach. The main reason is that the prime number theorem can 
be shown to be equivalent to a statement about zeros of the analytic continuation 
of a function (the Riemann zeta-function which we discuss in appendices 4B, 5B, 
and 5D), which seems preposterous at first sight. Although proofs can be given 
that avoid mentioning these zeros, they are still lurking somewhere just beneath 
the surface|*)| A proof of the prime number theorem is beyond the scope of this 
book (but see and [GS]). 


Exercise 5.4.1.1 Assume the prime number theorem. 
(a) Show that there are infinitely many primes whose leading digit is a “1”. How about leading 
digit “7”? 
(b) Show that for all « > 0, if x is sufficiently large, then there are primes between x and z+ ex. 
(c) Deduce that Ryo is the set of limit points of the set {p/q: p,q primes}. 
(d) Let ai,...,a be any sequence of digits, that is, integers between 0 and 9, with a; 4 0. 
Show that there are infinitely many primes whose first (leading) d digits are ai,...,@a. 


Exercise 5.4.2.1 Let p, = 2 < po = 3 < --- be the sequence of primes. Assume the prime 
number theorem and prove that 


Pn ~n logn asn—- oo. 


Exercise 5.4.3.1 (a) Show that the sum of primes and prime powers < x is ~ x?/(2logz). 
(b) Deduce that if the sum equals N, then z ~ /N log N. 


'M41y general, A(z) ~ B(x), that is, A(x) is asymptotic to B(a), is equivalent to 
limz-+00 A(x)/B(x) = 1. It does not mean that “A(a#) is approximately equal to B(«)”, which has 
no strict mathematical meaning, rather that for any « > 0, no matter how small, one has 


(1 — €)B(z) < A(x) < (1+ €) B(x) 


once 2 is sufficiently large (where how large is “ sufficiently large” depends on €). This definition concerns 

the ratio A(x)/B(x), not their difference A(a) — B(x). Therefore n? +1~ n? and n? + n?/logn ~ n? 

are equally true, even though the first is a better approximation to n? than the second ({Sha85}, p. 16), 
15 Including the so-called “elementary proof” of the prime number theorem. 


5.5. Bounds on the number of primes 163 


Primes in arithmetic progressions. As we mentioned in section [5.3) Dirichlet 
showed in 1837 that if (a,q) = 1, then there are infinitely many primes p = a 
(mod q). Dirichlet’s proof was combined in 1896 with the proof of the prime number 
theorem to establish that 


vip Se ps pame p= a nod a) i ~ Hayes’ 


The factor “1/¢(q)” emerges as there are ¢(q) reduced residues a modulo gq. 


Exercise 5.4.4.' Use the prime number theorem in arithmetic progressions to prove that for any 
integers ai1,...,@d,60,...,ba € {0,...,9}, with a1 # 0 and bo = 1, 3, 7, or 9, there are infinitely 
many primes whose first d digits are a1,...,a@q and whose last d digits are bg,..., bo. 


5.5. Bounds on the number of primes 


The first quantitative lower bound proven on the number of primes is due to Euler 
in the mid-18th century who showed that 


S- diverges, 

p prime P 

as we will prove in section of appendix 5B. This gives some idea of how 
numerous the primes are in comparison to other sequences of integers. For ex- 
ample )> 5, + converges, so the primes are, in this sense, more numerous than 
the squares. This implies that there are arbitrarily large values of x for which 


wD) > af i: 


Exercise 5.5.1.1 Do better than this using Euler’s result. 
(a) Prove that }7,51 wine converges. 
(b) Deduce that there are arbitrarily large x for which (x) > 2/(logx)?. 


Next we will prove upper and lower bounds for the number of primes up to 2, 
of the form 


(5.5.1) C1 


=< < 
logx — aa) Se log x 

for some constants 0 < cy < 1 < cg, for all sufficiently large x. The prime number 
theorem is equivalent to being able to take cj = 1 — € and cz = 1+ € for any fixed 
€ > Oin (5.5.1). Instead we will prove Chebyshev’s weaker 1850 result that one can 


take any c; < log2 and any cz > log4 in (6.5.1). 


Theorem 5.3. For all integers n > 2 we have 
n n 


(log2) . — 1 < n(n) < (log 4) +47 


login 


Exercise 5.5.2. Fix « > 0 arbitrarily small. Deduce Chebyshev’s bounds (5.5.1) with c1 = 
log 2 — € and cg = log4 +e, for all sufficiently large x, from Theorem 


logn log n)?° 


Proof. The binomial theorem states that (« + y)% = sr (") JyN-J, Taking 
x=y=1 we get 


(5:5.2) 3 (*) = 2%, 


j=0 


164 5. The distribution of prime numbers 


Lemma 5.5.1. The product of the primes up to N is <4N-! for all N > 1. 


Proof. Each prime in [n+1, 2n] appears in the numerator of the binomial coefficient 
cc) but not in the denominator, and so their product divides gonna: Now if 
N = 2n— 1 is odd, then (*"7') = (?"~') so the value appears twice in the sum in 


n-1 n 
(5.5.2). Therefore 


2n-1 
(5.5.3) Il v< ea ‘) 7 : > on ") _ 92n-2 _ yn 


n<p<2n j=0 
p prime 


We now prove the claimed result by induction on N > 1. The result is straight- 
forward for N = 1,2 by calculation. If N = 2n or 2n — 1, then the product of the 
primes up to N is at most the product of the primes up to n times the product of 
the primes in [n + 1, 2n]. The first product is < 4"~! by the induction hypothesis, 
and the second < 4"~! by the previous paragraph. Combining these two upper 
bounds gives the upper bound < 4?”~? < 4—!) as claimed. 


If we take logarithms in (5.5.3), we obtain 


(5.5.4) S- log p < (n—1)log4. 
p prime 
n<p<2n 


As each term of the left side is > login we deduce that 
—1 
(5.5.5) m(2n) — r(n) = #{p prime: n< p< 2n}< ia - log 4. 


We now use this to deduce the upper bound claimed in Theorem[5.3] We verify the 

bound by calculations for all N < 100 and then proceed by induction for N > 101. 

If N = 2n or 2n — 1 (so that n > 51), then by the induction hypothesis and (5.5.5) 
2n—1 n 


m(N) < r(2n) = r(n) + (7 (2n) — r(n)) < (log 4) log n * "ile mje" 


and for all n > 51 this is 


2n-1 , 2n-1 N N 


< (log 4) TGs Ne 


< (log 4) log N 


log2n ‘(log 2n)? 
as a careful calculation reveals. This yields the upper bound claimed in Theorem 


To obtain the lower bound claimed in Theorem [5.3] we begin by observing that 
the largest binomial coefficient ("") occurs with m = [n/2]. All the other binomial 
coefficients are smaller, as is (5) + ("), so that 


#=(()+()) +E) =o) 


5.6. Gaps between primes 165 


by (5.5.2). Now, all prime factors of any (ina) are <n, and in fact if p® divides 
(ina) then p® <n by Corollary |3.10.1]/to Kummer’s Theorem. Therefore 


Taking logarithms we deduce the claimed result. 


Exercise 5.5.3. Use exercise|3.10.3]and the last displayed equation to prove that 


Qn 
(5.5.6) Iem[m: m <n] > —. 
n 


5.6. Gaps between primes 


Let p, = 2 < pp = 3 <.--- be the sequence of primes. We are interested in the 
possible gaps, pn+1 — Pn, between primes. 


The prime number theorem tells us that there are about «/log x primes up to 
x, so that the average gap between primes < z is about loga: If N = 7(x), then 
pn is the largest prime < 2, and py ~ x by exercise[5.4.1(b). This implies that the 
average gap between consecutive primes up to z is 


i = pn —2 x 
> N 
N-1 Za +1 ~ Pn) N-1 x/log x ee 


by the prime number theorem. 


Are there gaps between consecutive primes that are much smaller than the 
average? Much larger than the average? What is the largest that gaps between 
primes can be, and what is the smallest? 


Exercise 5.6.1. (a) Prove that there are gaps between primes < z that are at least as large 
as the average gap between primes up to x. 
(b) Prove that there are gaps between primes < zx that are no bigger than the average gap 
between primes up to z. 


Legendre conjectured that there are always primes between consecutive squares, 
that is, that there are primes in the interval (n?, (n + 1)?) for every integer n. 


Exercise 5.6.2. (a) Show that if every interval (a, a + 2,/x) contains a prime, then there are 
always primes between consecutive squares. 
(b) Show that if there are always primes between consecutive squares, then every interval 
(a,x + 4,/x + 3] contains a prime. 


At present we do not know how to prove every interval (x, x + C./z) contains 
primes, for any given C > 0. However it has been proven, by Baker, Harman, and 
Pintz, that any interval (a,x + cx2t 30] contains a prime for c sufficiently large. 


Exercise 5.6.3. Deduce from this that there is a prime between any consecutive, sufficiently 
large, cubes. 


166 5. The distribution of prime numbers 


There is a simple way to construct a long interval which contains no primes: 


Proposition 5.6.1. For any integer m the interval m! +2, m!4+3,...,m!+m 
contains no primes. Therefore if py is the largest prime < m!+1, then pn4i—pn = 
m. 


Proof. If 2 < 7 < m, then 7 is included in the product for m!, and so 7 divides 
m!+j. Therefore m! + 7 is composite as it is > 7. Now pny, 4 m!+ 7 for each 
such j and so pny, > m!+m+1>pp+m. 


The gaps between primes constructed in this way are not quite as large as the 
average gaps. However one can extend this idea, creating a long interval of integers 
which each have a small prime factor, to prove that 


. Pn+1— Pn 
lim sup —————— = 00 
noo —« LO Dn 
Proving this is again beyond the scope of this book but a proof can be found in 


Graal. 


What about small gaps between primes? 


Exercise 5.6.4. Prove that 2 and 3 are the only two primes that differ by 1. 


There are plenty of pairs of primes that differ by two, namely 3 and 5, 5 and 7, 
11 and 13, 17 and 19, etc., seemingly infinitely many, and this twin prime conjecture 
that there are infinitely many prime twins p, p+ 2 remains an open problem. Until 
recently, very little was proved about short gaps between primes, but that changed 
in 2009, when Goldston, Pintz, and Yildirim (see [I]) showed that 
lim inf cae Le 
n—+00 log pr 
In 2013, Yitang Zhang, until then a practically unknown mathematician 4 showed 
that there are infinitely many pairs of primes that differ by at most a bounded 
amount. More precisely there exists a constant B such that there are infinitely 
many pairs of distinct primes that differ by at most B. This was soon improved by 
Maynard and Tao, though by a different method, so that we now know there are 
infinitely many pairs of consecutive primes pp, Pn+1 such that 


Pn+1— Pn < 246. 


This is not quite the twin prime conjecture, but it is a very exciting development. 
(See [2] for a discussion.) 


The proofs of Maynard and of Tao yield a further great result: For any in- 
teger m > 3 there are infinitely many intervals of length 214 which contain 
m primes. That is, there are infinitely many m-tuples of consecutive primes 
Pns Pntls+++;Pn+m-—1 such that 


14 
Pn+m—-1 — Pn < 2 ee 


16 See the movie Counting from Infinity (Zala Films, 2015) for an account of his fascinating story. 


5.7. Formulas for primes 167 


Further reading on hot topics in this section 
[1] K. Soundararajan, Small gaps between prime numbers: The work of Goldston-Pintz- Yildirwm, Bull. 


Amer. Math. Soc. (N.S.) 44 (2007), 1-18. 


{2] Andrew Granville, Primes in intervals of bounded length, Bull. Amer. Math. Soc. (N.S.) 52 (2015), 
171-222. 


5.7. Formulas for primes 


Are there polynomials (of degree > 1) that only yield prime values? That is, is 
f(n) prime for every integer n? The example 6n + 5 begins by taking the prime 
values 5, 11,17, 23,29 before getting to 35 = 5 x 7. Continuing on, we get more 
primes 41, 47,53, 59 till we hit 65 = 5 x 13, another multiple of 5. So every fifth 
term of the arithmetic progression seems to be divisible by 5, which we verify as 
6(5k) +5 = 5(6k +1). More generally gn +a is a multiple of a whenever n is 
a multiple of a, since g(ak) + a = a(qk +1). A famous example of a polynomial 
that takes lots of prime values is f(x) = x? +2+41. Indeed f(n) is prime for 
0 <n < 39. However f(40) = 41? and f(41k) = 41(41k?+4+1). Therefore f(41k) 
is composite for each integer k for which 41k? +k +14 —1, 0, or 1. 

We will develop this argument to work for all polynomials, but we will need the 


following result, which is a consequence of the Fundamental Theorem of Algebra 
and is proved in Theorem [3.11] of section [3.22]in appendix 3F. 


Lemma 5.7.1. A non-zero degree d polynomial has no more than d distinct roots 
in C. 


The main consequence that we need is the following: 


Corollary 5.7.1. Suppose that f(x) € Z[x] has degree d > 1. For any integer 
B> 1, there are no more than (2B + 1)d integers n for which |f(n)| < B. 


Proof. If nis an integer, then so is f(n), and therefore if | f(n)| < B, then f(n) =m 
for some integer m with |m| < B. Therefore n is a root of one of the 2B + 1 
polynomials f(x) — m, each of which has no more than d roots by Lemma [5.7.1 
and so the result follows. 


Proposition 5.7.1. If f(a) € Z|] has degree d > 1, then there are infinitely many 
integers n for which |f(n)| is composite. 


Proof. By Corollary 5.7.1] there are no more than 3d integers n for which f(n) = 
—1, 0, or 1, so there exists an integer a in the range 0 < a < 3d for which |f(a)| > 1. 
Let m := |f(a)| > 1. Now km +a = a (mod m) and so, by Corollary [2.3.1] we 
have 
f(km +a) = f(a)=0 (mod m). 

There are at most 3d values of k for which km +a is a root of one of f(x) — m, 
f(x), or f(x)+m, by Corollary [5-71] For any other k we have that | f(km-+a)| 4 0 
or m, in which case |f(km + a)| is divisible by m and |f(km + a)| > m, so that 
| f(km + a)| is composite. 


Exercise 5.7.1. Show that if f(x,y) € Z[x,y] has degree d > 1, then there are infinitely many 
pairs of integers m,n for which |f(m,n)| is composite. 


168 5. The distribution of prime numbers 


Nine of the first ten values of the polynomial 6n+5 are primes. The polynomial 
n? + n+ 41, discovered by Euler in 1772, is prime for n = 0,1,2,...,39 and the 
square of a prime for n = 40. However, in the proof of Proposition [5.7.1] we saw 
that n? + n+ 41 is composite whenever n is a positive multiple of 41. See section 
[12:5] for more on such prime rich polynomials. 


We discuss other places to look for primes in section [5.21] of appendix 5G. 


It is not difficult to show that if a polynomial f takes on infinitely many prime 
values, then f must be irreducible. The next result indicates how many prime 
values f needs to take before we know that f is irreducible. 


Theorem 5.4. If f(x) € Z|x] has degree d > 1 and |f(n)| is prime for > 2d+1 
integers n, then f(x) is irreducible. 


Proof. Suppose that f is reducible; that is, f = gh for polynomials g(x), h(x) 
€ Zia]. If |f(n)| = p, a prime, then g(n)h(n) = p or —p. Therefore one of g(n) 
and h(n) equals p or —p, the other 1 or —1. In particular n is a root of 
(g(a) — 1)(A(x) — 1)(g(x) + 1)(h(a) + 1), a polynomial of degree 2d. This has 
no more than 2d roots by Lemma [5.7.1] and so |f(n)| can be prime for no more 
than 2d integers n. 


This is often more than we need, as we see in the following beautiful result: 


Theorem 5.5. Write a given prime p in base 10 as p = ap + a,;10 +--- + ag104 
(with each a; € {0,1,2,...,9} and aq 4 0). Then ap + aya +++-+aqx% is an 
irreducible polynomial. 


Proof. Let f(z) = aox +--- + aax% and suppose that f = gh. As g(10)h(10) 
is prime, one of g(10) and A(10) equals 1 or —1. We will suppose that it is g 
(swapping g and h if necessary). As g(x) € Z[a] it can be written in the form 
g(x“) = e[j_(2 — aj) with c € Z, and so gee |10 — aj| < |g(10)| = 1. Therefore 
there is a root a of g(x) for which |a — 10| < 1. This implies that Re(a) € [9, 11] 
and so Re(1/a) > 0 and |Ja| > 9. 

As f(a) = 0 we deduce that 


d 
0=Re (4) = dq + dg_1Re (=) + aan (=) : 


As discussed above Re(1/a) > 0 and so ag_\Re(1/a) > 0. On the other hand, 
Re(1/a’) might be negative and so ag_;Re(1/a*) > —9/|a|’. Therefore 


d 
1 
U2+0=9) ae 
1=2 


which implies that 

1 9 1 
1<9 -= < 
2 Tak = Talal=) * 3 


as |a| > 9, which yields a contradiction. 


Exercise 5.7.2. Prove an analogous result for primes written in an arbitrary base b > 3. 


Questions about primes 169 


Exercise 5.7.3.1 Suppose that f(x) = aoz +---+aga% € Z[x] with each |a;| < A and ag # 0. 
Prove that if f(m) is prime for some integer n > A + 2, then f(z) is irreducible. 


There are many books on the distribution of primes. My favorites for beginners 
are |IT'MEF00) which explains the key ideas behind the prime number theorem and 
other important results in an accessible way, and which is more recreational 
but full of good stuff. The introductory book |H'W08} proves quite a few of the 
easier theorems in the subject. 


Additional exercises 


Exercise 5.8.1. Let m be the product of the primes < 1000. Prove that if n is an integer between 
10° and 10%, then n is prime if and only if (n,m) = 1. 


Exercise 5.8.2. Show that if p > 3 and q = p+ 2 are twin primes, then p+ q is divisible by 12. 


Exercise 5.8.3. Show that there are infinitely many integers n for which each of n,n+1,...,n+ 
1000 is composite. 


Exercise 5.8.4. Fix integer m > 1. Show that there are infinitely many integers n for which 
T(n) =m. 


Exercise 5.8.5.1 Fix integer k > 1. Prove that there are infinitely many integers n for which 
B(n) = p(n +1) =--- = p(n +k). 


Exercise 5.8.6. Let H be a proper subgroux!] of (Z/mZ)*. 
(a) Show that if a is coprime to m and q is a given non-zero integer, then there are infinitely 
many integers n =a (mod m) such that (n,q) = 1. 
(b) Prove that if n is an integer coprime to m but which is not in a residue class of H, then n 
has a prime factor which is not in a residue class of H. 
(c) Deduce there are infinitely many primes which do not belong to any residue class of H. 


Exercise 5.8.7.1 Suppose that for any coprime integers a and q there exists at least one prime 
a (mod q). Deduce that for any coprime integers A and Q, there are infinitely many primes 


A (mod Q). 


Exercise 5.8.8. Prove that there are infinitely many primes p for which there exists an integer 
a such that a? —a+1=0 (mod p). 


Exercise 5.8.9. Prove that for any f(x) € Z[a] of degree > 1, there are infinitely many primes 
p for which there exists an integer a such that p divides f(a). 


Exercise 5.8.10. Let £(n) = Icm[1,2,..., nl]. 
(a) Show that £(n) divides £(n 4+ 1) for all n > 1. 
(b) Express £(n) as a function of the prime powers < n. 
(c) Prove that for any integer k there exist integers n for which L(n) = L(n+1) =--- = L(n+k). 
(d)* Prove that if k is sufficiently large, then there is such an integer n which is < 3*. 


Exercise 5.8.11.' Prove that 


Lilt) | >1las t>o0. 


Exercise 5.8.12. Prove that 1 is the best choice for B when approximating Li(x) by z/(log z— B). 


Exercise 5.8.13.' Using the Maynard-Tao result, prove that there exists a positive integer k < 
246 for which there are infinitely many prime pairs p,p + k. 


17 H is a proper subgroup of G if it is a subgroup of G but not the whole of G 


170 5. The distribution of prime numbers 


Exercise 5.8.14. Suppose that a and 6 are integers for which g(a) = 1 and g(b) = —1, where 
g(x) € Za]. 
(a) Prove that b=a-—2,a—1,a+1,ora+2. 
(b)* Deduce that there are no more than four integer roots of (g(a) — 1)(g(a) +1) =0. 
(c)t Show that if g(a) has degree 2 and there are four integer roots of (g(x) — 1)(g(x) +1) = 0, 
then g(x) = h(a — A) where h(t) = t? — 3t +1, with roots A, A+ 1, A+ 2, and A+3. 
(d)t Modify the proof of Theorem [5.4] to establish that if f(x) € Z[x] has degree d > 6 and 
|f(n)| is prime for > d+ 3 integers n, then f(a) is irreducible. 


Let f(x) = h(x)h(a — 4), which has degree 4. Note that |f(n)| is prime for the eight values 
n =0,1,...,7, and so there is little room in which to improve (d). 


One can show that there are reducible polynomials f(x) € Z[a] of arbitrarily 
large degree d for which | f(n)| takes on at least d+1 prime values: Let py <--+ < pm 
be distinct primes. Let g(x) = []j". (pj — 2?) and q = g(1). By Dirichlet’s Theorem 
(section [5.3) we know that there are infinitely many primes pp = 1 (mod aE We 
select one such prime and write pp = 1+ @q for some positive integer 2. Now let 
f(x) = x(14+ €g(x)) which has degree d := 2m+1. We have that |f(+1)| = 1+éq = 
po and |f(+p;)| = p; for j =1,...,m, so there are > 2m+2 =d+1 integers n for 
which | f(n)| is prime. 


In the next exercise, assuming certain conjectures|!) we construct reducible 
polynomials f(2) € Z[ax] of arbitrarily large degree d for which |f(n)| takes on d+ 2 
prime values. This implies that the result in exercise [5.8.14[{d) is “best possible”. 


Exercise 5.8.15.' Assume that there are infinitely many positive integers n for which n? —3n+1 
is prime, and denote these integers by ny < n2 <---. Let gm(a#) := (n1 — x)---(nmm — a). If & 
is a positive integer for which 1 + gm(0),1+ €gm(1),1+ lgm(2),1+ £9m(3) are simultaneously 
prime, then prove that the polynomial f(a) := (x? — 32 + 1)(1 + lgm(a)) has degree d:= m+2 
and that there are exactly d+ 2 integers n for which |f(n)| is prime. 


18 We will prove this later, in Theorem [7.3] 
19 These conjectures follows from the Polynomial prime values conjecture stated in the bonus 
section of this chapter. 


Appendix 5A. Bertrand’s 
postulate and beyond 


5.9. Bertrand’s postulate 
In 1845 Bertrand conjectured, on the basis of calculations up to a million: 


Theorem 5.6 (Bertrand’s postulate). For every integer n > 1, there is a prime 
number between n and 2n. 


Bertrand’s postulate was proved in 1850 by Chebyshev. We will follow the 
19-year-old Erdés’s proof, or, as N. J. Fine put it (in the voice of Erdés): 
Chebyshev said it, but I'll say it again: 
There’s always a prime between n and 2n. 


Exercise 5.9.1. Show that prime p does not divide Ce when 2n/3<p<n. 


Proof of Bertrand’s postulate. Let p°? be the exact power of prime p dividing 
(7”). We know that 


@ ep = 1ifn < p< 2n by Kummer’s Theorem (Theorem [.7), 
e €p = 0 if 2n/3 < p< n by exercise [5.9.1 

e €p < 1if V2n < p< 2n by Corollary B.10.1) 

e pe < 2n if p < 2n by Corollary B.10-1] 


Combining these gives 


~<(™\=Tors > We I 


pS2n n<ps2n p<2n/3 p<V2n 


IA 


I] p| x 42n/3-1 y. (2n)(V2n+1)/2, 


n<p<2n 


171 


172 Appendix 5A. Bertrand’s postulate and beyond 


using Lemma to bound J],<2n/3p and the bound m(V2n) < 3(v2n +1) (as 
neither 1 nor any even integer > 2 is prime). Taking logarithms we deduce that 


log 4 V2 
S- log p > oe i eee log(2n). 
3 2 
p prime 
n<p<2n 
This implies that 
1 
9.1 ] >= 
(5.9.1) Ds ogp2 an 
p prime 
n<ps2n 


for all n > 2349, which implies Bertrand’s postulate in this range. (This lower 
bound should be compared to the upper bound (5.5.4).) 


If 1 < n < 5000, then the interval (n, 2n] contains at least one of the primes 2, 
3, 5, 7, 13, 23, 43, 83, 163, 317, 631, 1259, 2503, and 5003. 


Exercise 5.9.2. Use Bertrand’s postulate to prove that there are infinitely many primes with 
first digit “1”. 


Exercise 5.9.3. Use Bertrand’s postulate to show, by induction, that every integer n > 6 can be 
written as the sum of distinct primes. 


Exercise 5.9.4. Goldbach conjectured that every even integer > 6 can be written as the sum of 
two primes. Deduce Bertrand’s postulate from Goldbach’s conjecture. 


1 i 1 i 1 


Exercise 5.9.5. Use Bertrand’s postulate to prove that 1 is never an integer. 


n+1 ' n+2' “OT On 
Exercise 5.9.6. Prove that for every n > 1 one can partition the set of integers {1,2,...,2n} 
into pairs {a1,b1},...,{@n, bn} such that each sum a; + bj is a prime. 


Exercise 5.9.7.1 (a) Prove that prime p divides ) when n/2 < p < 2n/3. 
(b) Prove that the product of the primes in (3m, 12m] divides oa Cc. 
(c)t Deduce that we can take any constant c2 > & log(432) in (5.5.1). 

(Note that 2 log(432) = 1.3485... < log 4 = 1.3862...) 


(d) Now deduce Bertrand’s postulate for all sufficiently large x from (5.5.1). 


5.10. The theorem of Sylvester and Schur 


Bertrand’s postulate can be rephrased to state that at least one of the integers 
k+1,k+2,...,2k has a prime factor > k. This can be generalized as follows: 


Theorem 5.7 (Sylvester-Schur Theorem). For any integers n > k > 1, at least 
one of the integersn+1,n+2,...,n+k is divisible by a prime p > k. 


Proposition 5.10.1. Jf, for given integers n >k > 1, we have 


(5.10.1) (" ; *) > (n+ k)®, 


then at least one of the integers n+1,n+2,...,n+k is divisible by a prime p > k. 
If (6.10.1) holds for ny(k), then tt holds for all n > i(k). 


5.10. The theorem of Sylvester and Schur 173 


Proof. If the prime factors of n+1,n+ 2,...,n+k are all < k, then all of the 
prime factors p of gas) are < k. If p®|| (eG then p° < n+k by Corollary B.10.1] 
Therefore 


(5.10.2) ("7") < [[m+h) =(n+h)™, 
pSk 


contradicting (5.10.1). This proves the first part of the result. 
We prove the second part by induction on n > n1(k) using the following result. 


k 
Exercise 5.10.1. Prove that ( + a) < (1 + x) for alla >k>1. 


The result holds for n = i(k), so now suppose that (5.10.1) holds for some 
given n. Then 


_ 1(k) nm (k) 
( k ) (1 =a) k Je( i cae eter ny 


by exercise |5.10.1) and the induction hypothesis, and so (5.10.1) holds for n + 1. 
The result follows. 


Proof of the Sylvester-Schur Theorem for all k < 1500. Calculations give 
some value for n;(k) in Proposition [5.10.1] for all k < 1500, and so the Sylvester- 
Schur Theorem follows for these k and all n > ni(k) by Proposition 6.10.1] Now 
ni(k) = k for 202 < k < 1500, and k < ni(k) < k 417 for all k < 201. We verify 
the theorem for k <n <k+16 with k < 201, case by case. 


A just failed proof of the Sylvester-Schur Theorem. Calculations suggest 
that (7) > (2k)™) for all k > 202. If so, the Sylvester-Schur Theorem follows for 
all k > 202 by Proposition [5.10.1] However we just failed to prove this inequality 
as a consequence of the upper bound in Theorem [5.3] If one combines the upper 
bound on 7(k/4) from Theorem [5.3] together with exercise [5.9.7(b), then we can 
prove that ) > (2k)"™) for all sufficiently large k. However “sufficiently large” 
here is likely to be extremely large. 


Exercise 5.10.2. Prove that if m(k) < en —1 for all integers k > 1, then Theorem[5.7J holds 
foraln>k>1. 


Proof of the Sylvester-Schur Theorem for all k > 1500. If (5.10.1) holds, 
then the result follows from Proposition 5.10.1] Hence we may assume that (5.10.2) 
holds. Now, 7(k) < k/3 (which can be proved by accounting for divisibility by 2 
and 3), and aed > Bt for 7 =0,...,k —1 so that a) > (24*)*. Therefore 
(5.10.2) implies that 


k 
(“=*) < ("5") 2 (n +k") < (n+ &)R/3, 


which in turn implies that 


n+k< k3/?. that is, n < 13/2 — ke. 


174 Appendix 5A. Bertrand’s postulate and beyond 


Next we note that if p > (n+k)'/? and p°|| Ce) so that p° <n+k, then e =0 
or 1. Therefore we can refine (5.10.2) to 


k 
(5.10.3) eS ) < [J (™+h [[ ep = 2 ae, 
pS (n+k)1/2 psk 
by G54), as m((n + k)/?) < (mn +k)? < 4k3/4 
Now if n > 3k, then, by exercise [4.14.2] of appendix 4D, 
(44/3°)* at ef[Or*\ 2 paket gk-1 
ek Nhe p= k - 


which is false for all k > 1. Thereforen+k < 4k, and so ifn+k> 3k, then our 
inequality becomes 

ajar. f Bald n+k 1/2 op 

SPS (BRD) (MER caer 
This is false for all k > 780. 

Finally for the range k < n < 3k/2 if prime p is in the range (n+k)/3 <p<k, 

then 2p is the only multiple of p that appears in (n + 1)---(n +k) and so p does 
not divide aa Therefore 


G)<C( Es TL @+) TL vs TL ear’ TT». 


p<(n-+k)3/2 PS(m+k)/3 p< (n+k)}/2 pSdk/6 


which implies that 
4® 2 (4k)¥'? 458/6-1 
ek 

which is false for all k > 1471. 


Exercise 5.10.3. (a) Use Bertrand’s postulate and the Sylvester-Schur Theorem to show tha 
if 1<r<_s, then there is a prime p that divides exactly one of the integers r+ 1,...,s. 
(b) Deduce that if 1 <r <_s, then =a Se + is never an integer. 


Bonus read: A review 
of prime problems 


5.11. Prime problems 


In this bonus section we will discuss various natural sequences that are expected to 
contain infinitely many primes, highlighting recent progress. 


Mathematicians have tried in vain to discover some order in the sequence 
of the prime numbers and we have every reason to believe that there are 
some mysteries that the human mind shall never penetrate. 

— LEONHARD EULER (1740) 


Prime values of polynomials in one variable 


In section [5.6] we mentioned the twin prime conjecture, that there are infinitely 
many pairs of primes that differ by 2. What about other pairs? Obviously there 
can be no more than one pair of primes that differ by an odd integer k (as one of 
the two integers must be divisible by 2), but when the difference is an even integer 
k there is no such obstruction. Calculations then suggest that: 


For all even integers 2m > 0 there are infinitely many pairs of primes that differ 
by 2m. That is, there are infinitely many prime pairs p,p + 2m. 


Here we asked for simultaneous prime values of two monic linear polynomials x 
and x+2m. What if we select polynomials with different leading coefficients, like x 
and 2x+1? Such prime pairs come up naturally in Sophie Germain’s Theorem 
(of section [7.27]in appendix 7F) and calculations support the guess that there are 
many (like 3 and 7; 5 and 11; 11 and 23; 23 and 47;...). We therefore conjecture: 


There are infinitely many pairs of primes p,2p + 1. 


One can generalize this to other pairs of linear polynomials but we might again 
have the problem that at least one is even, as with p,3p-+ 1. 


175 


176 Bonus read: A review of prime problems 


Exercise 5.11.1. Give conditions on integers a,b,c,d with a,c > 0, assuming that (a,b) = 
(c,d) = 1, which guarantee that there are infinitely many integers n for which an + b and cn+d 
are different and both positive and odd. We conjecture, under these conditions that: 


There are infinitely many pairs of primes am + b,cm-+ d. 


For triples of linear forms and even k-tuplets of linear forms, there are more 
exceptional cases. For example, the three polynomials n,n + 2,n + 4 can all si- 
multaneously take odd values but, for each integer n, one of them is divisible by 
3. We call 3 a fized prime divisor, which plays the same role as 2 in the ex- 
ample n,n +k with k odd. In general we need that a given set of linear forms 
a,x +b,,a9x+bo,...,a,x2 + by with integer coefficients is admissible; that is, there 
is no fixed prime divisor p. Specifically, for each prime p, there exists an integer 
Np for which none of the a;n, + b; is divisible by p, which implies that p does not 
divide ajn +b; for 1 < j <k for every integer n =n, (mod p). This leads us to 


The prime k-tuplets conjecture. Let a,x + bi,...,a,nx + by be an admissible 
set of k linear polynomials with integer coefficients, such that each a; is positive. 
Then there are infinitely many positive integers m for which 

aym+ by,...,a,m + by are all prime. 


Exercise 5.11.2.! Assuming the prime k-tuplets conjecture deduce that there are infinitely many 
pairs of consecutive primes p, p + 100. 


Exercise 5.11.3.' Assuming the prime k-tuplets conjecture deduce that there are infinitely many 
triples of consecutive primes in an arithmetic progression. 


Exercise 5.11.4.' Assuming the prime k-tuplets conjecture deduce that there are infinitely many 
quadruples of consecutive primes formed of two pairs of prime twins. 


Exercise 5.11.5.' Let a@nii1 = 2an +1 for all n > 0. Fix an arbitrarily large integer N. Use the 
prime k-tuplets conjecture to show that we can choose ag so that ao,a1,...,a@y are all primes. 


Exercise 5.11.6. Show that the set of linear polynomials aim + 1,agm+1,...,a,m+1, with 
each a; positive, is admissible. 


There is more on prime k-tuplets of linear polynomials in appendix 5E. 


What about other polynomials? For example, the polynomial n? + 1 takes 
prime values 2,5,17,37,101,... seemingly on forever, so we conjecture that: 


There are infinitely many primes of the form n? +1. 


The polynomial x? + 2x cannot be prime for many integer values since it is 
reducible (recall Theorem [5.4] and exercise [5.8.14{c)). This is a different reason 
(from the fixed prime factors above) for a polynomial not to take more than finitely 
many prime values. These are the only reasons known for a polynomial not to take 
infinitely many prime values and, if neither of them holds, then we believe that the 
polynomial does take on infinitely many prime values. More precisely: 


Polynomial prime values conjecture. Let fi(x),...,fx(x) € Z[x], each irre- 
ducible, with positive leading coefficients. If f,--- fy has no fixed prime divisor, 
then: 
There are infinitely many integers m for which fi(m),..., fe(m) are all prime. 
To be precise, if f,,..., f, have “no fixed prime divisor” then we mean that for 
every prime p there exists an integer n, such that fi(np)--- fx(np) is not divisible 


Prime values of polynomials in several variables 177 


by p. The polynomial prime values conjecture specialized to linear polynomials is 
the prime k-tuplets conjecture24 


Exercise 5.11.7. Prove that the only prime pair p, p? + 2 is 3,11. 


Exercise 5.11.8. (a) Prove that if f1--- f, has no fixed prime divisor, then, for each prime p, 
there are infinitely many integers n such that fi(n)--- f,(n) is not divisible by p. 
(b)* Show that if p > deg(f1(ax)--- fg(x)) and p does not divide fi(x)--+ fx(z), then np exists. 
(c) Prove that if f;(«) = «+h, for given integers h1,...,h,, then np exists for a given prime 
p if and only if #{distinct h; (mod p)} < p. 


The only case of the polynomial prime values conjecture that has been proved 
is when & = 1 with f(.) is linear. The hypothesis ensures that f(a) = qx +a 
with g > 1 and (a,q) = 1. This is Dirichlet’s Theorem (that there are infinitely 
many primes = a (mod q) whenever (a, q) = 1, which we discuss in sections[8.17] of 
appendix 8D and [I3.7). 


Distinguishing primes and P;,’s from other integers. The Mobius function 
was introduced in section [4.5] and in Corollary [4.5.1] we saw that the sum 


S-u(d) 
d|n 


is non-zero only if n = 1 and so allows us to distinguish the integer 1 from all other 
positive integers. In section of appendix 4B we saw that if the sum 


>_ H(d) log(n/d) 
d|n 


is non-zero, then n has exactly one prime factor and so allows us to distinguish 
primes and prime powers from all other positive integers. A positive integer is 
called a “P,,” if it has no more than k distinct prime factors. In the next exercise 
we will see how an analogous sum allows us to distinguish P;’s. 


Exercise 5.11.9.1 (a)? Let 2o,...,a@m be variables. Prove that if m > k > 0, then 
k 
y jie (x0 +> a; ) =0. 
SC{1,2,...,m} jes 


(b) Deduce that if n has more than k different prime factors, then 


d= H(4)(log(n/d))* = 0. 


d\n 
(c)? What value does this take when n has exactly k different prime factors? 


Exercise 5.11.10. Show that if each prime factor of n is > n1/3, then n is either prime or the 
product of two primes. 


Prime values of polynomials in several variables 


One can ask for prime values of polynomials in two or more variables, for example, 
primes of the form m?+n? or the form a? +b? +1 or more complicated polynomials 
of mixed degree like 4a? + 27b?. What is known? 


20 This conjecture was first formulated by Andrzej Schinzel in 1958. He called it “Hypothesis H” 
in that paper, and the name has stuck. 


178 Bonus read: A review of prime problems 


The proof of the prime number theorem can be adapted to many situations, 
for example to primes of the form m? + n? or the form 2u? + 2uv + 3v? or indeed 
the prime values of any irreducible binary quadratic form (which are discussed in 
chapters 9 and 12) without a fixed prime divisor. The proof for m? + n? uses the 
fact that m? + n? = (m+ in)(m— in), the norm of m+ in. One can develop this 
to prove that any such norm form (the appropriate generalizatior2}| of m? + n? 
to higher degree) takes on infinitely many prime values as long as it has no fixed 
prime factor. A norm form is always a degree d polynomial in d variables. 


One can then ask for prime values of norm forms in which we fix some of the 
variables (perhaps to 0). For example, if m = 1 in m?+n?, we are back to the open 
question about prime values of n? +1. However in 2002 Heath-Brown was able to 
prove that a? + 2b? takes on infinitely prime values and then extended this, with 
Moroz, to any irreducible cubic form in two variables. In 2018, Maynard proved 
such a result for a family of norm formd2] in 3m variables of degree 4m (or less). 


These results on norm forms were all inspired by Friedlander and Iwaniec’s 1998 
breakthrough in which they took n to be a square in m? + n? (and therefore found 
prime values of u? + v*), following Fouvry and Iwaniec’s 1997 paper in which they 
took n to be prime (and therefore obtained infinitely many prime pairs p,m? + p?). 
This was the first example in which the polynomial in question is sparse in that the 
number of integer values it takes up to x is roughly x° for some c < 1. The current 
record sparsity is c = 2 from the work of Heath-Brown and Moroz. In 2017, Heath- 
Brown and Xiannan Li went beyond the Fouvry-Iwaniec and Friedlander-Iwaniec 
results by showing that there are infinitely many prime pairs p,m? + p*. 


In every case we expect that the proportion of values of the polynomial up to 
x which are prime is about c/ log, where c is a constant which depends on how 
often each prime divides values of the polynomial. 


Back in 1974, Iwaniec had shown how versatile sieve methods could be by 
showing that any quadratic polynomial in two variables (which is irreducible and 
has no fixed prime divisor) takes on infinitely many prime values, for example, 
m? + n?+1. We will see this result put to good use in appendix 12G when tiling 
a circle with smaller circles. 


What about the prime values of more than one polynomial in several variables? 
We can generalize our conjectures as follows: 


Multivariable polynomial prime values conjecture. Let fi(x1,...,2n),---; 
fe(a1,---,2n) € Zla1,...,%n], each of which is irreducible. Suppose that there are 
infinitely many n-tuplets of integers m1,...,™Mn for which each f;(m1,...,™Mn) ts 
positive. If fi--- fx, has no fixed prime divisor, then there are 
Infinitely many n-tuplets of integers m1,...,M%n for which 
fi(mi,...,™n),---,;fe(m1,..., Mn) are all prime. 


In 1939, van der Corput showed that there are infinitely many three-term arith- 
metic progressions of primes, which can be written as 


a,a+d,a+ 2d, 


21More precisely the norm of >), vii where the w; are a basis for the ring of integers of some 
number field of degree d and the x; are the variables. 
?2-The norm of soe xz;,w’ where the field, of degree 4m, is generated by w over Q. 


Goldbach’s conjecture and variants 179 


three degree-one polynomials in two variables. For a long time, methods seemed 
inadequate to extend this to length four arithmetic progressions, but this was re- 
solved in 2008 by Green and Tao, who proved that for any fixed integer k > 3 there 
are infinitely many prime k-tuplets of the form 


a,a+d,a+2d,...,a+(k-—1)d. 


The methods used were quite new to the search for prime numbers and this has 
led to widespread interest. In 2012, along with Ziegler, they were able to prove a 
very general result for linear polynomials, which is as good as one can hope for, 
given that there has been no progress directly on the prime k-tuplets conjecture: 


Until we prove the twin prime conjecture we will be unable to prove the mul- 
tivariable polynomial prime values conjecture, in full generality, even for linear 
polynomials, since two of the polynomials might differ by two, for example if «+ 3y 
and «+3y+2 are in our set. More generally, without progress on the prime k-tuplets 
conjecture, we must avoid any linear relation between two of our polynomials. 


Theorem 5.8 (The Green-Tao-Ziegler Theorem). Suppose that fi(x),..., f(x) 
are linear polynomials which satisfy the hypothesis of the multivariable polynomial 
prime values conjecture. Moreover assume that if 1<t<j<k, there do not exist 
integers a,b,c, not all zero, for which af;+bf; =c. Then there are infinitely many 
m € Z” for which f,(m),..., f(a) are all prime. 


We will discuss applications of the Green-Tao-Ziegler Theorem in appendix 5E. 


It is not difficult to show that there are infinitely many primes of the form 
b? — 4ac, the discriminant of an arbitrary quadratic polynomial. However we do 
not know how to prove that there are infinitely many primes of the form 4a® + 
27b7, the discriminant of the cubic polynomial «* + ax + b. Proving this would 
have a significant impact on our understanding of various questions about degree 3 
Diophantine equations. 


Exercise 5.11.11. Let g(x) =1+ ms (a—j). Prove that there exist integers a and b such that 
the reducible polynomial f(a) = (ax + 6)g(«) is prime when « = n for 1 < n < k. Compare this 
to the result in exercise[5.8.14{c) (with d= k +1). 


Goldbach’s conjecture and variants 


Goldbach’s 1742 conjecture is the statement that every even integer > 4 can be 
written as the sum of two primes. It is still an open question though it has now 
been verified for all even numbers < 4 x 101°. 


Great problems motivate mathematicians to think of new techniques, which 
can have great influence on the subject, even if they fail to resolve the original 
question. For example, although there have been few plausible ideas for proving 
Goldbach’s conjecture, it has motivated some of the development of sieve theory, 
and there are some beautiful results on modifications of the original problem. The 
most famous are: 

In 1975 Montgomery and Vaughan showed that if there are any exceptions to 


Goldbach’s conjecture (that is, even integers n that are not the sum of two primes), 
then there are very few of them. 


180 Bonus read: A review of prime problems 


In 1973 Jingrun Chen showed that every sufficiently large even integer is the 
sum of a prime and an integer that is the product of at most two primes. Here 
“sufficiently large” means enormous. 


In 1934 I. M. Vinogradov proved that every sufficiently large odd integer is the 
sum of three primes. The “sufficiently large” has recently been removed: Harald 
Helfgott, with computational assistance from David Platt, proved that every odd 
integer > 1 is the sum of at most three primes. 


Exercise 5.11.12. Show that the Goldbach conjecture is equivalent to the statement that every 
integer > 1 is the sum of at most three primes 


Other questions 


Before this chapter we asked if there are infinitely many primes of the form 2? — 1 
(Mersenne primes) or of the form 2?” +1 (Fermat primes). We can ask other 
questions in this vein, for example prime values of second-order linear recurrences 
which start 0,1 (like the Fibonacci numbers) or their companion sequences (see 
exercise [3.9.3] or prime values of high-order linear recurrence sequences. 


Mersenne primes written in binary look like 111...111, and so are palindromic. 
Some people have been interested in primes of the form $(10” — 1) which equal 
111...111 in base 10 and so are palindromic. We are unable to prove there are 
infinitely many Mersenne primes, so how about the easier question, are there infin- 
itely many palindromic primes when written in binary or in decimal or indeeed in 
any other base? Also open. 


We saw earlier that it is not difficult to show that there are infinitely many 
primes with the first few digits given. But how about missing digits? Can one find 
infinitely many primes which have no 7 in their decimal expansion or no 9 or no 
consecutive digits 123? These questions are all answered in a remarkable recent 
paper of Maynard [4]. 


Let M be a given n-by-n matrix. The (i,7)th entry of M, M?,... can all be 
described by an nth-order linear recurrence sequence. ‘To see this think of the 
2 0 
0 1 
many prime values. A recent question of interest is to take two (or more) such 
matrices M and N say, and then look at the entries of all “words” created by M 
and N, for example M*N?M°.--N*, and ask whether the entries are infinitely 
often prime (see section of appendix 9D and appendix 12G for a beautiful 
example). 


powers of ( }. We have already asked whether the trace can take infinitely 


Guides to conjectures and the Green-Tao Theorem 


[1] David Conlon, Jacob Fox, and Yufei Zhao, The Green-Tao theorem: An exposition, EMS Surv. 
Math. Sci. 1 (2014), 249-282. 


?3-This was in fact the form in which Goldbach made his conjecture. Goldbach was a friend of Euler, 
arguably the greatest mathematician of the 18th century, and would often send Euler mathematical 
questions. In one letter Goldbach asked whether every integer > 1 is the sum of at most three primes, 
and Euler observed that this is equivalent to showing that every even number > 4 is the sum of two 
primes. Why then does Goldbach get credit for this conjecture that he did not make? Perhaps because 
“Euler is rich, and Goldbach is poor.” 


Guides to conjectures and the Green-Tao Theorem 181 


[2] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’; III: On the expression 
of a number as a sum of primes, Acta Math. 44 (1923), 1-70. 


[3] Bryna Kra, The Green-Tao theorem on arithmetic progressions in the primes: An ergodic point 
of view, Bull. Amer. Math. Soc. 483 (2006), 3-23. 


[4] James Maynard, Small gaps between primes, Annals Math. 181 (2015), 383-413. 


{[5] A. Schinzel and W. Sierpiriski, Sur certaines hypothéses concernant les nombres premiers, Acta 
Arith. 4 (1958), 185-208; erratum 5 (1958), 259. 


Appendix 5B. An important 
proof of infinitely many primes 


5.12. Euler’s proof of the infinitude of primes 


In the 17th century Euler gave a different proof that there are infinitely many 
primes, one which would prove highly influential in what was to come later. Suppose 
again that the list of primes is py < po <--+ < px. Euler observed that the funda- 
mental theorem of arithmetic implies that the sets {n > 1: nis a positive integer} 
and {p{'ps?...p," : a1,@2,...,a% > 0} contain the same numbers. Therefore a 
sum involving the elements of the first set must equal the analogous sum involving 
the elements of the second set. In particular, 


— a1,.a2 Qk 
ns sie 2 
n>1 @1,42,...,4h>0 (P1 P2 Pr ) 
n a positive integer 


, 
ma 
fhe 
~~ 
w 
iy 
| 
on 
she 
— 
w 
S 
Sf} 
~~" 
wD 


The last equality holds because each sum in the second-to-last line is over a geo- 
metric progression. Euler proved that if we take s = 1, then the left-hand side 


becomes )>,,., + which diverges, as we saw in section of appendix 4D, and 
the right-hand side equals a finite product of rational numbers, ih aa ane which 
is itself a rational (and so finite) number. This contradiction implies that there 
cannot be finitely many primes. 


182 


5.12. Euler’s proof of the infinitude of primes 183 


What is wonderful about Euler’s formula is that something like it holds without 
assumption about the number of primes. So 


(5.12.1) S- = II (-2)" 


n>1 p prime 
n a positive integer 
holds whenever both sides are absolutely convergent, which we will now show occurs 
when Re(s) > 1. (This is the most special case of the general Dirichlet series 
studied in section [4.9] of appendix 4B, though we will now be more careful about 
these convergence issues.) For the left-hand side, if s =o +it with o > 1, then the 
sum of the absolute values of the terms converge as 


1 sige ear 1 
De ale as <i+f =14 =14—=-7.. 
ne 1 1l-o}, ao-1 o-1 


n>1 n>1 


Here we have used that n~? < A hae t~°dt since t~? is a decreasing function in t 
(much as in the discussion in section [4.14] of appendix 4D of the more difficult case 
when o = 1). 

A product [],5,(1 — aj) with each |aj| < 1 converges absolutely if 7,5, |aj| 
converges. As each |p~*| = |p~?| < p-t < 1, we deduce that the Euler product in 
(6.12.1) converges absolutely as 


1 1 
 |e|= eS Le Sar 


p prime p prime P 


by the last displayed equation. 

We have just seen that makes sense when s is to the right of the 
horizontal line in the complex plane going through the point 1. Like Euler, we 
want to be able to interpret what happens to (5.12.1) when s = 1. To not fall 
afoul of convergence issues we need to take the limit of both sides as s > 17, since 
(5.12.1) holds for real values of s > 1. Now 1/n? > | ie dt/t® for each n, as 1/t” 
is a decreasing function and the integral is over an interval of length 1, and so 

1 °° dt 1 
oe | 7 = 

n 1 t a—1 


n>1 


Then implies that 

HL (-p)< 

p prime 

so that 

(5.12.2) II (1 - ~) = lim (1 2 =) < lim (o—1) =0. 


olt oit 
p prime P p prime Dp 


Upon taking logarithms this implies Euler’s famous result that 


1 
(5.12.3) S > = diverges. 


p prime 


184 Appendix 5B. An important proof of infinitely many primes 


An explicit estimate for the sum in (5.12.3). It is useful to estimate the sum 
of 1/p over the primes p < x. One can do so using the (difficult to prove) prime 
number theorem, but one can also obtain the following good estimate from more 
elementary methods (see chapter 4 of [Graa]): There exists a constant c such that 


, 1 
(5.12.4) Jim > ——loglogz | =c. 
p prime 
pSu 


The difference between log Tl zg _ a and the first term in its approximation, 
yee a is a sum of positive terms, which are < >7, set) =e aot) =1. 
The difference therefore converges to a limit as 7 + oo. Exponentiating we deduce 


that there exists a constant y such that 


-1 = 
1 7 

(1 - 5) . : : 

p log x 


pSu 


an explicit improvement on (5.12.2). (This estimate appeared in section 5.2) 


Another use of (5.12.4) is to deduce that the number of steps in the sieve of 
Eratosthenes (as in section 5.2) is 


1 

fe pied 2 tg S$” = < a(loglog x + C) 
2 3 Pk P 
pSVva 


for any constant C > c — log 2, once z is sufficiently large, where pz is the largest 
prime < /z. 


Another derivation of (5.12.1). One begins with }>,., 4, the sum of 1/n* 
over all integers n > 1. Now suppose that we wish to remove the even integers from 
this sum. Their contribution to this sum is 

1 1 1 1 
» Se (Qm)s 2s ee 


n>1 m>1 
n éven 


writing even n as 2m, and hence 


1 1 1 1 1 
~ w-Le- Le-l-ls 
ga nai en nai 


If we wish to remove the multiples of 3, we can proceed similarly, to obtain 


ee 


n>1 
(n,2-3)=1 


and for arbitrary y, letting m = hes D, 


5.13. The sieve of Eratosthenes and estimates for the primes up to © 185 


As y — oo, the left side becomes the sum over all integers n > 1 which do not have 
any prime factors: The only such integer is n = 1 so the left-hand side becomes 


1/1* = 1. Hence 
1 1 
TM (-y) Dest 


p prime 


an alternative formulation of (5.12.1). The advantage of this proof is that we see 
what happens when we “sieve” by various primes, that is, when we remove the 
integers from our set that are divisible by given primes. 


Exercise 5.12.1. Show that if Re(s) > 1, then 
ul 1 al 1 
Q-ga)DOSa-Ds- Ls 


n>1 n>1 n>1 
n odd nm even 


Reference on Euler’s many contributions 
[1] Raymond Ayoub, Euler and the zeta function, Amer. Math. Monthly 81 (1974), 1067-1086. 


5.13. The sieve of Eratosthenes and estimates for the primes up to x 


Fix € > 0 to be an arbitrarily small positive constant. By (5.12.2) we know that 


there exists y such that 
1 € 
1-- =. 
II ( 5) <3 


psy 
Let m be the product of the primes < y, and select x to be sufficiently large, where 
x > 3y/e is large enough. Any prime > y must be coprime to m, the product of 
the primes < y, and so 


n(x) — r(y) = #{primes p € (y,a}}< SOL. 


n<ux 
(n,m)=1 


Obviously m(y) < y. We let k = [a/m], so that km < a < 2km, and therefore, 
writing n = jm-+1, 


2k-1 2k-1 
Eis Vie YL 1s Y 1=2800m. 
n<u n<2km j=0 1<i<m j=O 1<i<m 
(n,m)=1 (n,m)=1 (jm+i,m)=1 (i,m)=1 


We deduce that 
1 
n(x) <y+2ke(m) < <7 4+2km |] (1- *) < <2 +2e-< = en. 
3 = p 3 3 


Since this holds for any € > 0 we deduce that 


1 
(5.13.1) lim — #{p<.wx: p prime} > 0; 


@r>0o LY 


that is, a vanishing proportion of the integers are prime. 


186 Appendix 5B. An important proof of infinitely many primes 


5.14. Riemann’s plan for Gauss’s prediction, | 


In 1859, Riemann wrote a nine-page memoir that was to shape the future of number 
theory. This was his only paper in number theory, yet its ideas guided the study 
of the distribution of prime numbers from then on. Riemann proposed a plan to 
prove Gauss’s guesstimate for the number of primes up to x, discussed in section 
[5.4| This involved moving the question from number theory to analysis via the 
theory of analytic continuation. 


We define the Riemann zeta-function ¢(s) by 


1 
¢(s) = ns? 
n>1 
an infinite sum whose value is well-defined when the sum is absolutely convergent, 
that is, for Re(s) > 1 as discussed in section Our starting point is Euler’s 
formula (5.12.1), which connects the Riemann zeta-function to the set of primes: 


co- TL (3) 


p prime 


Taking logarithms of both sides, we get 


The coefficient of 1/n® is 0, unless n is a power of a prime p, say p”, in which case 
the coefficient is 1/k. In other words, if we let a, = 1/k if n = p* is a prime power, 
and 0 otherwise, be the characteristic function for the prime powers, then 

an 

logc(s) = OS, 
n>1 

What we need now is a way to extract )7,,<,,@n, the sum of the coefficients up to 
x of such a Dirichlet series, and this, Riemann realized, was provided by an idea of 
Perron, as long as one could analytically continue the Dirichlet series well to the 
left of the line Re(s) = 1. If one can do this, and this is a big “if”, then one gets 
a formula for the number of prime powers up to z, in terms of the zeros of the 
analytic continuation of log ¢(s), something we will discuss more in section [5.16] of 
appendix 5D. 


Appendix 5C. What should be 
true about primes? 


Gauss suggested (see section [5.4) that the density of primes near x should be about 
1/log x, so that in an interval of length y around x, we expect that 


(5.15.1) #{p prime: a <p<ax+y} is about 
ga 


This makes no sense if y, say, is 5 log x as you can’t have half a prime. Does it 
make sense for larger y? And is there a way to interpret Gauss’s suggestion to 
better understand (x + y) — 1(x) when y = $ log? 


5.15. The Gauss-Cramér model for the primes 


The great probabilist Cramér interpreted Gauss’s suggestion as the following model 
for the sequence of primes: Let X3, X4,... be an infinite sequence of independent 
random variables such that 


1 


1 
Prob(X, = 1) = — and Prob(x, =0) = 1——\. 
logn 


logn 
This defines a probability space on all infinite sequences of 0’s and 1’s. The indicator 
function for the odd primes is such a sequence: 


0 0 000 1 0 1 00 0 1 =«02z... 


1 1 1 
corresponding to 3 5 7 11 13 17 
Cramér proposed that one should think of this as a “typical” sequence of 0’s and 1’s, 
under this probability measure, so that anything that can be said with probability 
1 for the space of such sequences should be true for the sequence of primes. First 


notice that the expectation for the sum up to N, which we would guess would be 


187 


188 Appendix 5C. What should be true about primes? 


a good approximation for the number of primes up to N, is 


N N N 1 
1039) =» (Xn) =D oon 


n=3 n=3 


which equals Li(a) within an error of 1, so that works! But it is more important to 
ask whether the sum is usually near to its expected value? We will prove that, for 
any fixed e > 0, then with probability — 1 as N — oo, that 


a N dt 
> Xn — logt 
n=3 3 °8 
This fits very well with what we observed about the data for the sequence of primes. 
Moreover in the next appendix we will see that this being true about the primes is 


equivalent to the famous “Riemann Hypothesis”. To prove one calculates 
the variance, a key quantity in probability theory: 


(Sx Eick) n(x) -(Ee5) 


n=3 


Nite 


(5.15.2) 


The first term here equals 


N N N N ( 1 1 
7 2, 7 My — 
Soest + Sa) ah) st De 
n=3 m=3 Ls n=3 3<mA~n<N 


and therefore 


= Noa\y Bra 1 
(eed) => (Gea ea) 


n=3 


which can be shown to be < 2N/log N. Now, for any random variable Y, we have 


oy?) =) ?-dP(Y >t) >T?-P(\Y|>T). 


(Here, P(“event”) means the probability that the “event” takes place.) Therefore, 
taking Y = Ss X,, — Li(N) we deduce from the above that (5.15.2) fails with 
probability < N~?¢ once N is sufficiently large. 


By modifying this argument one can prove that, for short intervals, for any 
fixed € > 0 we have, with probability — 1 as 7 > cw, 


and so (6.15.1) should be about right provided y > (log x)?**, a rather short in- 
terval. However we are very far from proving anything like this for the primes 
themselves. A more careful analysis of this kind even suggests that 


ma +y)—m(x)~ 


log x 
for 100% of the intervals (x, x+y] with « < X where y = y(2) is a function of x for 
which y/log x — co as x + oo. The “100%” here does not mean the same thing as 


Short intervals 189 


“all”. This is 100% in the sense that this estimate is true for a certain proportion 
of intervals up to X, and this proportion tends to 100% as X + co. 


Short intervals 


For intervals of length 5 log x, our calculation should be interpreted as meaning 
that at least half of such intervals contain no primes, a reasonable proportion of 
such intervals contain one or two primes, and very few contain a large number of 
primes, all so the average is 5. This situation has been considered by probability 
theorists and gives rise to a Poisson distribution. It suggests that for any fixed 
real number t > 0 and any fixed integer & > 0 with probability that tends to 1 as 


X>O, 
ett 
k} 
This probability is maximized at the integers k closest to t, as one might guess. 


P{(x,x+tloga] where x < X, contains exactly k primes} > 


One can also turn to the Gauss-Cramér model to guess how big y needs to be, 
in terms of x, to guarantee a prime in (2,2 +4 y] for every x. That is tantamount 
to asking for the largest gaps between consecutive primes. Now, if x and y are 
integers, then 


aty ety 1 
P(X, = 0 for alln € (a,x + y]) = I] P(Xn = 0) = I] (- Ga) 
Pye Pi) n=a2+1 


If y is not very large, then logn is very close to log x, and so we can approximate 
this product by 


y 
(1 ane ) which is roughly e~¥/!°8*, 
log x 

Therefore if X < x < 2X and y is “small”, then the probability that X, = 1 for 
some n € (x,2 + y] is roughly 1 — e~¥/'°8*, In this context we want this to be 
very close to 1 (which implies that y is substantially larger than log X). There are 
X intervals of the form (x, x + y] with x an integer for which X < a < 2X, and so 
the probability that all of them contain at least one prime is therefore roughly 


—Xey/ los x 


x 
(1 —e¥/ Pes) which is roughly e 


This quantity is very close to 1 if y is a little bit bigger than (log X)?, and it is 
very close to 0 if y is a little bit smaller than (log X)?. Cramér developed this into 
a formal proof that with probability 1 as X > oo, if ny < ng < ng <--- are the 
indices n for which X, = 1, then 
— nk) ~ (log X)?. 
jase C= ma) (log x) 
Therefore Cramér conjectured that the largest gap between consecutive primes 
Pn <Pn+1 Should be about log? p,. In other words, if p; = 2 < pp =3<--- is the 
sequence of prime numbers, then 


max(Pn41— Pn) ~ (log x)’. 
Prax 


190 Appendix 5C. What should be true about primes? 


Here are the record-breaking gaps, in terms of the ratio of these two quantities: 


Pn Pn+1 — Pn (Pn4i — Pn)/ log? Pn 
113 14 .6264 
1327 34 .6576 
31397 72 .6715 
370261 112 .6812 
2010733 148 -7026 
20831323 210 .7395 
25056082087 456 -7953 
2614941710599 652 .7975 
19581334192423 766 .8178 
218209405436543 906 8311 
1693182318746371 1132 .9206 


(Known) record-breaking gaps between primes 


We see from the data that the constant is slowly creeping upwards but will it ever 
reach 1? And will it go beyond? We don’t know. 


Twin primes 


The Gauss-Cramér model suggests that the number of prime pairs p,p + k with 
p<N is about 


N N N 1 
& Xn) == d I (X,) v (Xn - k) = d (log n)log(n + k) 


which is ~ jean One gets the same number whether k = 1 (when there is just 
one such prime pair) or k = 2 (when we expect infinitely many), so the model 
gives an incorrect prediction, because it does not take account of divisibility by 2. 
We can modify the Gauss-Cramér model to take account of divisibility by “small” 
primes. When asking questions about primes around x we will take X,, = 0 if n 
has a prime factor < z for an appropriate “small” number z. This new model will 
not yield the same prediction for the number of primes up to x, as above, unless, 
when n has no prime factor < z, we now have 


K 


Prob(X, =1) = ee and Prob(X,=0) = 1 


~ Jogn’ 

where k = K(z) = [[,<, 547. With this modified model we avoid pairs X, = 
Xn+k = 1 with k odd and small, and we are even able to predict that the number 
of prime pairs p,p +k < N with k even should be about 


~ = kK kK N 
t ) XnXn = y : Eee 
(= «| = logn log(n +k) ss (log N)? 


(n(n+k),m)=1 


1 -1 
p<zP and Cr a 2TI, an odd prime (1 _ a) ; Lig p odd p—2° 


is 


where m = [| 


Twin primes 191 


We can also use this model to predict the number of pairs of primes p,q for 
which p+ q= WN in Goldbach’s conjecture, which should be about 
N-3 N-3 
K K N 
D XnXn—n | = : ~ Cy. 
» " os logn log(N —n) a (log N’)? 


GiN aA 


Computations suggest that these predictions are about right. 


Appendix 5D. Working with 
Riemann’s zeta-function 


In section 5.4] the data led us to believe that a(x), the number of primes up to z, 


is best approximated by the complicated integral Li(z) = : ee This is not an 


easy function to work with (see exercises [5.8.11] and [5.18.2). If, instead, we count 
the primes with the weight log p attached to prime p, then the expected value of 


. 1 
S > log p is So logn- ben =—7 1, 
n=2 


pSau 


a far easier function to manipulate. If one can estimate the weighted sum well, then 
one can deduce good estimates for 7(x) by a mathematical technique called partial 


summation (see the book [Graal]). 


5.16. Riemann’s plan for Gauss’s prediction 


Riemann came up with the most surprising plan to try to verify Gauss’s prediction 
by using complex analysis, that is, the theory of calculus in the complex planes 
Riemann begins with an extraordinary identity (due to Perron): For any “reason- 
ably behaved” sequence of complex numbers {an }n>1 we have 


Here the integral is along the straight line going from 2 — ico to 2+ 00. The key 
point to appreciate is that in order to evaluate the finite sum on the left, we use the 
infinite integral on the right, which involves a Dirichlet series which is an infinite 
sum, an identity that seems only to (vastly) complicate matters. However the 
integral on the right can be approached with the tools of calculus and sometimes 


?4This is not the place to discuss this intriguing theory in detail but we will use from it, and 
elucidate, the ideas that we need to describe Riemann’s plan. 


192 


5.16. Riemann’s plan for Gauss’s prediction 193 


this yields a rich dividend. If we wish to count the primes with the log weight 
suggested above, then the integrand on the right-hand side of our identity involves 
the Dirichlet series 
i . 
pe” 


p prime 


where the only non-zero coefficients of 1/n* occur when n is a prime. This is 


more or less — ae) which was discussed in section [4.11] of appendix 4B, the only 
difference being that this also has non-zero coefficients when n is a prime power 
p’™ with m > 2 (extra terms which will have little impact on our discussion.) So 


— a8) is fit for Riemann’s purpose|2)] 


The sum in the Dirichlet series for ¢(s) = 0,5, 1/n* is absolutely convergent 
only when Re(s) > 1. The theory of analytic continuation allows one to extend the 
function to the whole complex plane so that (s — 1)¢(s) is differentiable, arbitraril 
often, everywhere in cP4 It is too complicated to go into all of the details evel] 
but it leads to Riemann’s amazing exact formula 


x? ¢"(0) 
5.16.1 logp = x—- —- --, 
pee) d, 18 5 
p prime p: G(p)=0 
m>1 
p™ <x 


relating the number of prime powers up to z to the zeros, p, of the analytic continu- 
ation of ¢(s). This is a surprising thing to do. We take a nice elementary question, 
the count of the primes up to 2, and relate it to some sum over the (infinite set of) 
zeros of an analytic continuation. What is great is that the estimate x, expected 
for the left-hand side, pops out of the calculation. But in order to show that this is 
really a good estimate for our sum over primes, we will need to establish that the 
sum over the zeros p on the right-hand side of is small compared to z. 

How big is each = term? The key point is that |a?| = a®°() so the smaller the 
real parts of the zeros p, the smaller the 7°/p terms in comparison to «, especially 
as « grows. Indeed, by establishing that each Re(p) < 1, Hadamard and de la 
Vallée Poussin both proved the prime number theorem in 1896. Riemann himself 
calculated a few zeros of ¢ gn all the real parts seemed to be 1/2, which led 
him to his famous conjecture 


Conjecture 5.1 (The Riemann Hypothesis). 


If C(p) =0 with 0 < Re(p) < 1, then Re(p) = 5. 


?5Riemann himself used log ¢(s), which also only has non-zero coefficients when n is a prime power. 
Today people prefer to use —¢’(s)/¢(s) since it makes the calculus significantly easier. 
26Ruler was able to determine the values of ¢(0),¢(—1),¢(2),...; these values have no obvious 
connection to the series that defines ¢(s) when Re(s) > 1. For example, if s = —1, one has the series, 
as 1/n-t =n for each integer n > 1, 
1 
12° 
In fact ¢(1 — n) = —B,,/n where B,, is the nth Bernoulli number, for each integer n > 2. 
27 Though see for a wonderful introduction, expanded on in my book [Graal]. 
28Riemann wrote: “It is very probable that all roots are [of the form 4 + it with t] real. Certainly 


one would wish for a stricter proof ... I have meanwhile temporarily put aside the search for this after 
some fleeting futile attempts.” 


1+2+4+3+--- , which is evidently not the same as ¢(—1) = 


194 Appendix 5D. Working with Riemann’s zeta-function 


If the Riemann Hypothesis is true, then each |a?| = 2!/?, from which one can 
deduce that if x > 3, then 


y log p — «| < a'/?(log x)?. 


psu 
This in turn implies that 
|(x) — Li(a)| < 2/7 loge. 


The implications go two ways. If these bounds are true, then the Riemann Hypoth- 
esis is true! 

We still do not know whether the Riemann Hypothesis is true, but the available 
evidence points towards it. For example we know that the 10'° zeros nearest to, 
but not on, the z-axis all lie on the line Re(s) = 4. The Riemann Hypothesis 
is a famously difficult question and appears on every list of famous problems in 
mathematics?% It is said that_the person who resolves the Riemann Hypothesis 


will become immortal, literally 29 


Exercise 5.16.1. Prove that ¢(s) = 0 has no zeros p with Re(p) > 1. 


The key to all of this is Riemann’s extraordinary identity (5.16.1). Where does 
it come from? How should we think about it? We will address these questions in 
the next section. 


5.17. Understanding the zeros 


Fourier analysis is built on the idea that most functions on [0,1) can be given as 
an (infinite) sum of sines and cosines. The first interesting example is given by 


> 


1 S sin(2rnz 
go =y sin(2mnx) 

2 27n 

n=1 

which can be shown to hold whenever 0 < x < 1. This representation of x — 4 by 
an infinite sum is not something one can calculate term by term in practice, and so 
one hopes to approximate x — $ by the sum of the first few terms of the series. The 
first term a sin(272:) does not provide a good approximation but even the sum of 
the first two, —4+ sin(27x) — + sin(47z), gives the general shape of x — $. By the 
time we add the sine terms for n from 1 to 100, we get a very good approximation. 


2°Sometimes with big financial rewards for resolving it, though there are easier ways to make a 
million dollars! 

3°-Those who have made the biggest leaps in our understanding, Hadamard and de la Vallée Poussin, 
lived to 97 and 95, respectively. Most important contributors have lived to a ripe old age. 


5.17. Understanding the zeros 195 


In our formula for « — 5, the numbers 27n (inside the “sin(2mnz)”) are the 
frequencies of the various component waves (controlling how fast they go through 
their cycles), while the coefficients —2/27n are their amplitudes (controlling how 
far up and down they go). 


Riemann’s formula is a bit too technical to fully appreciate. However 
it can be simply, albeit rather surprisingly, rephrased colloquially in the following 
terms: 

The primes can be counted as a sum of waves. 


Moreover it will be significantly easier to understand, not to say more elegant 1] 
if we assume that the Riemann Hypothesis is true. With this assumption we can 
write the difference between 7(x) and Li(z) as a (suitably weighted) sum of sine 
waves: 


x 
J ai — #{primes < x} 


(5.17.1) TiTloas 


ees 
wl +2 3 ees a) 


All real numbers y>0 Y 


such that 5 +i 
is a zero of C(s) 


The numerator of the left-hand side of this formula is the error term when comparing 
the Gauss prediction Li(x) with the actual count a(x) for the number of primes 
up to x. We saw earlier that the overcounts seemed to be roughly the size of the 
square root of x, so the denominator ./z/log x appears to be an appropriate thing 
to divide through by. The right side of the formula bears much in common with our 
formula for «— $. It is a sum of sine functions, with the numbers y employed in two 
different ways in place of 27: Each ¥ is used inside the sine (as the “frequency” ), 
and the reciprocal of each + forms the coefficient of the sine (as the “amplitude” ). 
We even get the same factor of 2 in each formula. However, the numbers ¥y here 
are much more subtle than the straightforward numbers 27n in the corresponding 
1 


formula for x — oe 


31 The aesthete believes the Riemann Hypothesis because its consequences, like this formula, are 
too beautiful and well-fitting not to be true! 


196 Appendix 5D. Working with Riemann’s zeta-function 


We can obtain approximations for the number of primes up to x by evaluating 
the right-hand side of this formula by summing over the first ten or hundred zeros 
5 +7 of ¢(s) = 0 (or more accurately, using the formula in (.16-1)). Here we can 
order the zeros by the size of the 7. The first few zeros give 


ny = 14.135...5 Yo = 21.022...; 3 = 25.011...3 y¥4 = 30.425... 


(These do not seem to be rationals nor numbers that can easily be identified in 
some other context.) For example we approximate the left-hand side of (5.16.1) for 
x up to 20 by using the first ten zeros, then the first 100, and finally the first 1000 
ZeYOs. 


yA r J —~! 
5 ‘4 15 15 | 
ro c— Z_ 

10 ro 10) t— 10) —_ 

ae | = 

sal we 
st_J Sp) st] 
Po =a = a 
10 15 20 10 15 20 10 15 20 


Figure 5.1. Using Riemann’s zeros to approximate the count of primes 
gs g 19) 1) 


The step function in these graphs is the actual function on the left-hand side of 
(5.16.1), the more complicated function, constructed of waves, is the sum of the 
first few terms of the right-hand side. By the time we use a thousand zeros the 
function and its approximation are indistinguishable to the naked eye. 


Riemann’s paper and the subsequent observations gave birth to the subject 
known today as analytic number theory. There are many books on the subject, 
including [Graal], in which we will develop these and other ideas so as to be better 
able to count primes. 


An elementary proof 


Riemann’s formula shows that estimates for the number of primes up to 
x and understanding the location of the zeros of ¢(s) are more or less tautologous 
questions. This led to a widespread belief that there could not be an “elementary 
proof” of the prime number theorem, one which avoids the zeros of ¢(s). It therefore 
came as a great shock when, in 1949, Selberg and Erdés gave elementary (but 
complicated) proofs. At the heart is Selberg’s elementary estimate for the number 
of (suitably weighted) integers up to x that are the product of at most two primes: 


> (logp)?+ S>  (logp)(log¢g) — 2a log a} < Cx, 
pyr Pqgx 
p prime p,q primes 


for some constant C > 0. 


Primes and complex analysis 197 


5.18. Reformulations of the Riemann Hypothesis 


Each of the following estimates are equivalent to the Riemann Hypothesis. Fix 
e > 0. If N is sufficiently large, then 


e |log(Icm[1,2,...,.N]) — N| < .N1/2t¢, 
Donen p(n)| a Nets, 
© |#{n < N: Qn) is even} — #{n < N: Q(n) is odd}| < N+, 


where 2(n) denotes the number of prime factors of n, counting multiplicities. The 
connections can be seen through the following exercise: 


Exercise 5.18.1. (a) Prove that log(Iem[1,2,...,N]) = DU ,m<y logp. 


(b)* Use (ILI) to show that Dpm<n log P = Viav<n Md) log a. 
(c) Express p(n) in terms of Q(n) and w(n). 


A key difficulty in counting primes lies in their definition, given in terms of what 
they are not (they are not the product of two integers > 1) rather than in terms 
of what they are. The advantage of the formulation in exercise 5.18.1{b) in terms 
of sums of the Mobius function is that this is a multiplicative function, built up 
constructively and so lends itself more naturally to elementary manipulations. This 
leads to an alternative approach to analytic number theory without deep complex 
analysis (see [GS]). 


Exercise 5.18.2. For any integer m > 1: 
(a) Prove that there exists a constant Cm such that if x > 2, then 


- dt x i” dt 

= Cm +m : 
2 (logt)™ ~ (logay™ = ™ 2 (logt)™+1 
(b) Prove that there exists a constant Cm such that if > 2, then 


m1 


kle edt 
Li(x) = Cm +m! 
we X& (logaye#E vm [ (log t)™+1 


(c)t Prove that there exists a constant Km such that if 2 > 3, then 


1 dt Km2 
O0o< < i 
~ Je (logt)™t! ~ (loga)m*t 


Exercise 5.18.3. (a) Prove that n? = n? for any integer n and p€ C. 
(b) Explain why if ¢(p) = 0, then ¢(p) = 0. 
(c) Show that if p = 4 + iy, then 


ve oP 1/2 cos(ylog x) + 2ysin(y log x) 
ail 2, 
pp 4+7 
(d) Show that if y is large, then the expression in (c) is roughly gi/2. 2sin(ylog a) | 
This exercise explains how (5.16.1) yields the approximation (5.17.1). 


Primes and complex analysis 


[1] Brian J. Conrey, The Riemann hypothesis, Notices Amer. Math. Soc. 50 (2003), 341-353. 


Appendix 5E. Prime patterns: 
Consequences of the 
Green-Tao Theorem 


In 2008 Green and Tao proved that there are infinitely many k-term arithmetic 
progressions of primes, that is, non-zero integers a,d such that 


a,a+d,...,a+(k—1)d 
are all prime. The smallest arithmetic progression of ten primes is given by 
199, 409, 619, 829, 1039, 1249, 1459, 1669, 1879, 2089, 


which we can write as 199+ 210n, O<n< 49. 


Length k | Arithmetic Progression (0 < n < k —1) Last Term 

3 3+ 2n 7 

4 5+ 6n 23 

5 5+ 6n 29 

6 7+ 30n 157 

7 7+ 150n 907 

8 199 + 210n 1669 

9 199 + 210n 1879 

10 199 + 210n 2089 

aki 110437 + 13860n 249037 

12 110437 + 13860n 262897 

13 4943 + 60060n 725663 

14 31385539 + 420420n 36850999 
15 115453391 + 4144140n 173471351 
16 53297929 + 9699690n 198793279 
17 3430751869 + 87297210n 4827507229 
18 4808316343 + 717777060 17010526363 
19 8297644387 + 4180566390n 83547839407 
20 214861583621 + 18846497670n 572945039351 
21 5749146449311 + 26004868890n 6269243827111 


The k-term arithmetic progression of primes with smallest last term. 


198 


5.19. Generalized arithmetic progressions of primes 199 


Despite there being arbitrarily long arithmetic progression of primes, it is not 
easy to find long ones. The longest explicitly known is the first 26 terms of 


3486107472997423 + 371891575525470n. 


Green and Tao proved that the smallest k-term arithmetic progressions of primes 
are all 


although we might guess that < k! +1 is true, for each k > 3. 


5.19. Generalized arithmetic progressions of primes 


There are squares filled with primes, which are in arithmetic progression, when one 
looks along any row or any column like 


503 | 1721 | 2939 | 4157 
5 | 17 | 29 29 | 41 | 53 863 | 2081 | 3299 | 4517 
47 | 59 | 71 59 | 71 | 83 1223 | 2441 | 3659 | 4877 
89 | 101 | 113 89 | 101 | 113 1583 | 2801 | 4019 | 5237 


Are there such squares of arbitrary size? The entries of such an N-by-N square 
can be parametrized as 


a+mb+ne withO<mn<N-1. 
This is a 2-dimensional generalization of an arithmetic progression. How about 
three dimensions? 


A7 | 383 | 719 149 | 401 | 653 251 | 419 | 587 
179 | 431 | 683 173 | 347 | 521 167 | 263 | 359 
311 | 479 | 647 197 | 293 | 389 83 | 107 | 131 


The three layers of a 3-by-3-by-3 Balog cube of primes. 


The arithmetic progressions in this cube run in three directions, along each row 
and along each column and up through the layers. For example the top left entries 
of each layer, 47, 149, 251, are in arithmetic progression, as are the primes in the 
center of each layer, 431, 347, 263. 

What about cubes of primes of higher dimension? The (m1, 72,...,nqa) entry 
of an N-by-N-by-: ---by-N cube, in d-dimensions, where each n,; lies in the range 
0<n,; < N —1 is given by 

a+b, + ngb2 +--+ + naba, 
for some integers a,b,,...,bg. If we let each b; = N*~'q for some integer qg, then 
the entries are 

a+nq whereO<n<k-1 
writing n =n, +ngN + ngN?2 +---+ngN*'! in base N. 

The Green-Tao Theorem states there exist k-term arithmetic progressions of 
primes, a+ ng, 0 <n < k—1, and so this gives rise to an N-by-----by-N Balog 
cube of primes by the above construction. 


200 Appendix 5E. Prime patterns: Consequences of the Green-Tao Theorem 


Consecutive prime values of a polynomial 


Arithmetic progressions a + nd, n = 1,2,..., can be viewed as the values of a 
degree-one polynomial. Hence the Green-Tao Theorem can be rephrased as stating 
that for any k there are infinitely many different degree-one polynomials such that 
their first k values are prime. 


How about degree-two polynomials? A famous example is the infamous qua- 
dratic polynomial n? + n+ 41, which is prime for n = 0,1,...,39 (which we will 
study in detail in section [[2.5); there is no other monic quadratic polynomial that 
has such a long run of primes compared to the size of its coefficients. Other exam- 
ples of quadratic polynomials known to have a long run of initial prime values are 
36n” — 810n +2753 and 36n? — 2358n + 36809, both of which give distinct primes for 
n =0,1,...,44. Are there quadratic polynomials whose first & values are prime, 
for any given integer k > 1? And if so, are there degree d polynomials whose first 
k values are prime? We resolve these questions by using the Green-Tao Theorem: 


We begin with a k¢-term arithmetic progression of primes, 
a+ jb for every integer j in the range 0 < j < k¢—1. 


Then a+ bn¢ is prime for every integer n in the range 0 < n < k—1. This restated 
gives: 


The first & values of the polynomial ba? + a are all prime. 


This technique does not yield monic polynomials, since we cannot control the 
value of b. To achieve this we need the extra power of the Green-Tao-Ziegler 
Theorem (Theorem [5.8). Consider the set of k linear forms 


bask, be Ba 2" b+ Bab 8, a, Oe PR. 


Exercise 5.19.1. Show that if 2 # 7 are integers and a and b are variables, then there do not 
exist integers u,v, w, not all zero, for which u(b + ia + 1?) + v(b4+ ja tj?) = w. 


We can show there is no fixed prime divisor: For each prime p, select a = ay = 
—1 (mod p) so that b+ ja+ j2=b+j?—j (mod p). Now 


S,:={j-37? (modp): jE Z}={j-—j? (mod p): j =0,...,p—1} 


by Corollary 2.3.1] Since j — j? = 0 (mod p) for 7 = 0 and 1 (mod p), the set Si, 
contains < p — 1 distinct elements, and let 6 = 6, (mod p) where by is a residue 
that is not in S,. Therefore b+ an +n? #0 (mod p) for all integers n, 1<n<k. 


We can therefore deduce from the Green-Tao-Ziegler Theorem that there exist 
infinitely many pairs of integers a and b such that: 


The first k values of the polynomial x? + ax +b are all prime. 


Exercise 5.19.2. Prove that there exist infinitely many pairs of integers a and 6 such that the 
first k values of the polynomial x4 + ax + b are all prime. 


Primes as averages 201 


Magic squares of primes 


We discussed constructing magic squares in section of appendix 1C. Here are 
two small examples, whose entries are primes. 


17 | 89 | 71 41 | 89 | 83 
113 | 59 | 5 113 | 71 | 29 
47 | 29 | 101 59 | 53 | 101 


Examples of 3-by-3 magic squares of primes. 


Do you recognize the primes involved? Do you notice any similarities with the 
examples of 3-by-3 squares of primes above? 


Exercise 5.19.3. Prove that every 3-by-3 square of integers in arithmetic progressions along each 
row and column can be rearranged to form a 3-by-3 magic square and vice versa. 

37 | 83 | 97 | 41 41 | 71 | 103 | 61 

53 | 61 | 71 | 73 97 | 79 | 47 | 53 

89 | 67 | 59 | 43 37 | 67 | 83 | 89 

79 | 47 | 31 | 101 101 | 59 | 43 | 73 


Examples of 4-by-4 magic squares of primes. 


It has long been known that there are n-by-n normal magic squares for any 
n > 3. We will take one, whose entries are m;;, 1 < i,j <n, the distinct integers 
between 1 and n?. The square with (i, j)th entry a +m,,; is also an n-by-n magic 
square (exercise [1.13.1). The Green-Tao Theorem implies that there are infinitely 
many pairs of integers a, b for which all of the integers a+b, 1 < £ < n?, are prime; 
in particular, each a+m,,;b is prime. This then yields infinitely many n-by-n magic 
squares of primes for all integers n > 3. 


Primes as averages 


Balog showed that there exist arbitrarily large sets A of distinct primes such that 


for any a,b € A the average wth is also prime (and all of these averages are distinct): 


Exercise 5.19.4. Prove that the averages of any two distinct elements of the set 2,27, 28,...,2™ 
are distinct. 


The Green-Tao Theorem implies that there are infinitely many pairs of integers 
a, b for which all of the integers a+ nb, 1 <n <k=2™, are prime, and so let 


A= {a+ 2b,a+ 4b,a+ 8b,...,a+2*d}. 


Exercise 5.19.5. Prove that the averages of any two distinct elements of A are distinct and 
prime. 


Exercise 5.19.6.' Prove that there exist arbitrarily large sets A of primes such that the average 
of any subset of A yields a distinct prime (e.g {7, 19,67}, {5,17, 89,1277} and {209173, 322573, 
536773, 1217893, 2484733}). 


Appendix 5F. A panoply 
of prime proofs 


The main idea in Euclid’s proof is, given a list of primes p),...,px, to find an 
integer q > 1 that is coprime to p,---px and so its prime factors are not in that 
list. Euclid took gq = p,--- pp +1 but he might have taken q to be p, --- px — 1, or 
mp,:+:p, +1 for any integer m # 0. 

We could also split our list of primes into any two subsets, M UN, and let 
qg=m-+n where m=] ,emP and n = [[pen £. 


Exercise 5.20.1. Show that (q,mn) = 1 and deduce that q has a prime factor not on our list. 


One could take g = |m — nl, as long as this is not 187] One can have more 
than two summands: If N = p,---ppz, let gq = 4 N/p;. Now p,; divides N/p; 
whenever i # 7, so that (q,p;) = (N/p;,p;) = 1. 

In 2017 MeStrovic gave a nice new proof: Suppose that q is the product of the 
finite set of odd primes, so that q > 5. Now q—2 is odd but cannot be divisible by 
an odd prime (or else it would divide the difference, 2, between q and q — 2), and 
therefore q — 2 = 1; that is, g = 3, a contradiction. 


Furstenberg’s (point-set) topological proof 


One of the most elegant ways to present Euclid’s idea is in Furstenberg’s extraor- 
dinary proof using basic notions of point-set topology: 

Define a topology on the set of integers Z in which a set S is open if it is empty 
or if for every a € S there is an arithmetic progression 


Z(a,m) :={a+nm:neZ}, 


32There are a couple of examples known where m — n = 1. Most famously, at least for baseball 
afficionados, Babe Ruth’s record of 714 = 2 x 3 x 7 x 17 home runs was overtaken by Hank Aaron’s 
715=5x 11 x 13. 


202 


A proof by irrationality 203 


with m 4 0, which is a subset of S. Obviously each Z(a,m) is open, and it is also 


closed since 
Z(a,m) =Z\ U Z(b,m). 
b: O0<b<m-—1, b#a 
If there are only finitely many primes p, then A = U,, Z(0,p) is also closed, and 
so Z\ A = {—1,1} is open, but this is false since {—1,1} is finite and so cannot 
contain any arithmetic progression Z(1,m), as this would contain infinitely many 
integers. This contradiction implies that there are infinitely many primes. 


Some love the surprising sparse elegance of this proof. However, others dislike 
the way it obscures what is really going on. 


An analytic proof 


We count the positive integers up to x whose prime factors come only from a given 
set of primes P = {p, < po <-:- < py}. These integers all take the form 


(5.20.1) pips +++ p;* for some integers e;, each > 0. 


We are going to count the integers up to x = 2™ —1, for an arbitrary integer m > 1: 
For each j, the prime p; > 2, and every other p;‘ > 1, so that 
2°4 <p? < pipe py <2" —1. 
This implies that e; is at most m— 1, and so there are at most m possibilities 
for the integer e;, the integers from 0 through to m — 1. Therefore the number of 
integers of the form (5.20.1), up to 2” — 1, is 
k 
< #{integers e; :0<e; <m—1}= m*, 
j=l 
Now if P (which contains 2,3,5,7,11 so that & > 5) is the set of all primes, then 
every positive integer is of the form (5.20.1) and therefore the last equation implies 
that 2” — 1 < m* for all integers m. We select m = 2", so that 2° < k?, which is 
false since k > 5. We deduce that there cannot be finitely many primes. 


This proof highlights counting arguments, the basis of analytic number theory. 


A proof by irrationality 


Euler exhibited the inspiring identity 

T It tf, 

4 3.5 7 9 
Let 6(n) = 1 or —1 as n = 1 or —1 (mod 4), and let d(n) = 0 if n = 0 (mod 2). 
This is a multiplicative function and so, by (49.1), we have 


re @- 0-3). (+d) 
4 no D Dp , 
n>1 p prime p prime 
p=1 (mod 4) p=3 (mod 4) 


It is well known that a (and so 7/4) is irrational, but under the assumption that 
there are only finitely many primes, the right-hand side is a finite product of rational 
numbers so is itself rational, a contradiction. 


Appendix 5G. Searching for 
primes and prime formulas 


5.21. Searching for prime formulas 


Proposition 5.7.1] proves that there is no (non-constant) polynomial that takes only 
prime values, and exercise [5.7.1] says the same thing for polynomials in more than 
one variable. But perhaps there is a more exotic formula than mere polynomials, 
which yields only primes? Earlier we discussed the Fermat numbers, 2?” +1, which 
Fermat had mistakenly believed to all be prime, but maybe there is some other 
formula? One intriguing possibility stems from the fact that 


2? 1, 22-11, 22"*-1_4, and gray 


are all prime. Could every term in this sequence be prime? No one knows and the 
next example is so large that no one will be able to determine whether or not it is 
prime in the foreseeable future. (Draw lessons on the power of computation from 
this example!) 

With a little imagination it is not so difficult to develop formulas that easily 
yield all of the primes. For example if py = 2 < pg = 3 < --- is the sequence of 
primes, then define 


= Pm _ 
a= S- ion? = .2003000050000007000000011.... 


m>1 


One can read off the primes from the decimal expansion of a, the mth prime coming 
from the few digits to the left of the m?th digit, or, more formally, 


Pm = [10 a] — 102"-1[100"-)" gl. 


Is a truly interesting? If one could easily describe a, other than by the definition 
that we gave, then it might provide an easy way to determine the primes. But with 
its artificial definition it does not seem like it can be used in any practical way. 
There are other such constructions. 


204 


5.22. Conway’s prime producing machine 205 


In a rather different vein, Matijasevic, while working on Hilbert’s tenth prob- 
lem, discovered that there exist polynomials f in many variables, such that the set 
of positive values taken by f when each variable is set to be a non-negative integer 
is precisely the set of primes|?5] One can find many different polynomials for the 
primes; we will give one with 26 variables of degree 21. (One can cut the degree to 
as low as 5 at the cost of having an enormous number of variables. No one knows 
the minimum possible degree nor the minimum possible number of variables): Our 
polynomial is k + 2 times 


1—(n+l+v—y) —(Qnt+p+q+2-—e)? —(wet+tht+j—-a) 

(gk + 2g +k+1)(h+97)+h—2z)? — (e+ pl(a—p) + t(2ap— p* —1) — pm)? 
(p+l(a—n —1) + b(2an + 2a —n? —2n — 2) —m)? — ((e? — Dl? +1-—m’?)P 
— (q+ y(a— p—1) + s(2ap + 2a — p® — 2p — 2) — 2)? — (a? — 1)y? +1- 2)? 
( 
( 


— (16(k + 1)?(k + 2)(n +1)? +1-— f?)? — (e?(e + 2)(a + 1)? +1- 07)? 

— (16r?y*(a? — 1) +1—wu?)? — (ai +k +1-1-i) 

— (((a+ u?(u? — a))? —1)(n + 4dy)? +1—- (2+ cu)?)?. 
Stare at this for a while and try to figure out how it works: The key is to determine 
when the displayed polynomial takes positive values. Note that it is equal to 1 
minus a sum of squares so, if the polynomial is positive, with k + 2 > 0, then the 
second factor must equal 1 and therefore each of the squares must equal 0, so that 


n+l+u—y = nt+pt+qt+z—e = wzetht+j-q =-::-=0. 


Understanding much beyond this seems difficult, and it seems that the only way 
to appreciate this polynomial is to understand its derivation; see [I]. In the cur- 
rent state of knowledge it seems that this absolutely extraordinary and beautiful 
polynomial is entirely useless in helping us better understand the distribution of 
primes! 


5.22. Conway’s prime producing machine 


Begin with the integer 2 and multiply it by the first fraction in the list 

17 78 19 23 29 77 95 77 1 11 138 15 15 55 

91’ 85’ 51’ 38’ 33’ 29’ 23’ 19’ 177 13’ 11°14’ 27 1 
for which the product is an integer. So we have 2 x 2, giving 15. Repeat the 
process with this product, 15, and continue over and over again. One obtains the 
integers 


2, 15, 825, 725, 1925, 2275, 425, 390, 330, 290, 770,.... 


Other than the first 2, the powers of 2 on the list are 27, 2°, 2°, 27, 2! 218, a!7_.., 
In other words, if 2" appears, then k is prime, and one can even show that every 
prime power of 2 appears. This is an extraordinary way to find the primes. It is 
a challenge to determine why this works. (Hint: Study how the exponent of each 
prime dividing our integer varies as it is multiplied with each fraction in the list.) 


33Qne can also construct such polynomials so as to yield the set of Fibonacci numbers or the set 
of Fermat primes or the set of Mersenne primes or the set of even perfect numbers (see section[4.2) and 
indeed any Diophantine set (see [1]). 


206 Appendix 5G. Searching for primes and prime formulas 


5.23. Ulam’s spiral 


Ulam’s idea was to write the integers in a spiral starting from 1, marking the prime 
numbers and looking for patterns. Here is the spiral up to 151: 


86 
22 123 124 125 126 (27) 128 129 130 (31) 132 133 


Figure 5.2. Ulam’s spiral up to 151 with the primes encircled. 


Ulam spirals capturing many more integers can be found online. 


Ulam observed that there are diagonals with lots of primes, for example the 
rising diagonal 


79, 47, 23, 7, 19, 39, 67, 103, 147. 


Starting from the 7, the numbers on the descending diagonal from 7 to the left are 
the values of the polynomial 4n? + 4n — 1 with n = 1,2,3,..., while the numbers 
on the rising diagonal from 7 to the right are the values of the polynomial 4n? + 3 
with n = 1,2,3,.... 

Likewise the descending diagonal from 11 contains primes 11, 29, 89,131; this 
diagonal contains the values of the polynomial 4n? + 6n +1 with n = 1,2,3,.... 

In an extended diagram one observes many prime-rich lines, for example a line 
which corresponds to the values of the polynomial 4%? +1702 +1847 = X?+X+4+41 
where X = 22+42, as well as a line which corresponds to the values of 42?+47 +59. 

To better understand this diagram, let U(0,0) = 1, and let U(x,y) be the 
integer that is x to the right and y up from 1; for example, U(3,2) = 36. 


Exercise 5.23.1. Prove that we have 


4x2 -a+1+y if —a# <y< 2 withe>0, 
4y7+ytl-« if -y<a<y withy>0, 
4x? —3a+1—y if —|a| <y < |a| with « <0, 
Ay? + 3y+i+a if —|y| <a <|y| with y <0. 


U(x,y) = 


5.24. Mills’s formula 207 


A line that hits infinitely many points with integer coordinates takes the form 
ay = bx + c with a,b,c integers with (a,b) = 1. By Theorem the solutions all 
take the form z =r+na, y=s-+nb, as n varies. For a line in the right quadrant, 
—a<b<a, so we are looking for prime values of the polynomial 


f(n) = 40? —2+1+y= An? + Bn+C with n> 0, 
where A = (2a), B = 8ar—a+b, C = 4r? —r+1+8, which has discriminant 


(b— a)? — 16a(a +c). Therefore we are looking for a quadratic polynomial with 
many prime values. 


The examples we found earlier were on the diagonal so that a = 1,b = +1. 
As a = 1 we can take r= 0 and s=cso that A=4,B =6b-—1,C=c+1. For 
the polynomial f to take odd values we need C odd, so c even, and therefore the 
polynomial is 

either 4n?+C or 4n?—2n+C. 

We will discuss “prime-rich” quadratic polynomials in chapter 12. 
Exercise 5.23.2. Let three consecutive values of a quadratic polynomial f be f(n—1) = u, f(n) = 
v,f(n +1) =w. Prove that f has discriminant ae — uw. 


5.24. Mills’s formula 


Although Legendre’s conjecture that there is always a prime between consecutive 
squares is unresolved, we do know that there is a prime between every pair of 
consecutive cubes > No (exercise [5.6.3). In 1947, Mills deduced from this that 
there exists a constant 7 (now called Mills’s constant), such that 


[73" | is a prime qn for every integer n > 0. 
To prove this, let qj > No be prime and then select g,+41 to be any prime for which 
e < dn41 < (dn + 1)° for all n > 0. 
ré,.= i and un = (qn + 1)/3", then, taking 3"*'th roots in the line above, 
Ln <Llnga <+++ <Un4zi < Un. 


The @, are therefore an increasing bounded sequence so must tend to a limit; call it 
T. Then ¢, <7 < Un for all n and so qn < p< dn +1. We cannot have equality 
in the upper bound or else 7°” is an integer, and so rT?” is an integer for all m > n, 
implying that qm = 7°" —1 for all m > n, and in particular 

n+1 


Gn41 = ee -l= In(Gn + 3qn + 3) 


so is not prime. Therefore gy < T°" < qn +1 for all n and so qn = |7°" |, as claimed. 


Further reading on primes in surprising places 


[1] James P. Jones, Daihachiro Sato, Hideo Wada, and Douglas Wiens, Diophantine representation of 
the set of prime numbers, Amer. Math. Monthly 83 (1976), 449-464. 

({2] M. L. Stein, S. M Ulam, and M. B. Wells, Mathematical notes: A visual display of some properties 
of the distribution of primes, Amer. Math. Monthly 71 (1964), 516-520. 


Appendix 5H. Dynamical 
systems and infinitely many 
primes 


We will show that various different polynomial dynamical systems each give rise to 
a different proof that there are infinitely many primes. 


5.25. A simpler formulation 


The sequences (dy)n>0 and (Fp)n>0, used to prove that there are infinitely many 
primes, can be determined by multiplying all of the terms of the sequence so far 
together and adding a constant. We can rewrite this as follows: 


An+1 = 4041 °°** An-1° An +1= (dn = L)an +1l= f(@n), 


where f(x) = x? —x+1. Similarly the F,,-values can be determined by 


Fag = (2?) +1)(2" -1)+2=F,(F, -2)+1= f(F,), 


where f(z) = x? — 2x + 2. So they are both examples of sequences (2,,)n>0 for 
which 


Inti =f (2a) 
for some polynomial f(x) € Z[x]. The terms of the sequence are all given by the 
recursive formula, so that 


tn = f(f(-.--£(20))) = fF" (20); 
—~_— 


~ 
n times 
where the notation f” denotes the polynomial obtained by composing f with itself 
n times (which is definitely not the nth power of f). Any such sequence (%)n>0o is 
called the orbit of xp under the map /f, since the sequence is completely determined 
once one knows zy and f. We sometimes write the orbit as 79 > 41 9 tg > ---. 


208 


5.26. Different starting points 209 


The key to the proof that the a,,’s are pairwise coprime is that a, = 1 (mod a,,) 
whenever n > m > 0, and the key to the proof that the F,,’s are pairwise coprime 
is that F,, = 2 (mod F,,) whenever n > m > 0. These congruences are not difficult 
to deduce using Corollary 2.3.1} For f(x) = x? — x +1, the orbit of 0 under the 
map f is0 >1—>41-..--. Therefore if n > m > 0, then we have 


an = f°" (am) = rma (2) =1 (mod Gm); 


and so (@n,@m) = (1,@m) = 1. Similarly if f(~) = x? — 2x + 2, then the orbit of 
0 under the map f is0 > 2>2-—.--- and so F, = f"-"(Fim) = f"?-™(0) = 2 
(mod F;,), which implies that ifn > m, then (Fr, Fm) = (2, Fm) = 1 as each Fy, 
is odd. 

This reformulation of two of the best-known proofs of the infinitude of primes 
hints at the possibility of a more general approach. 


Exercise 5.25.1. Show that if fm(a) =a, then fmin(a) = fn(a) for alln > 0. 


5.26. Different starting points 


We will use Lemma [5.1.1] to prove that there are infinitely many primes. This 
requires finding an infinite sequence of integers aj < ag < --- that are pairwise 
coprime. 


The orbit of 2 under the map x > x? — x+1 is an infinite sequence of pairwise 
coprime integers, 2 > 3 — 7 — 43 > 1807 > --- . What about other orbits? The 
orbit 4 > 13 > 157 > 24493 — --- is also an infinite sequence of pairwise coprime 
integers, as is the orbit 5 — 21 > 421 4 .---. The same proof as before yields that 
no two integers in a given orbit have a common factor. 


The orbit of 3 under the map x > x? — 24 + 2 yielded the Fermat numbers 
345-17 257 > ---, but starting at 4 we get 4 - 10 > 82 > 6562 > --- 
These are obviously not pairwise prime as every number in the orbit is even, but if 
we divide through by 2, then we get 


2455-41 3281>.-.-- 


which are pairwise coprime. To prove this, note that zo = 4 and 7,41 = x2 —2x,+2 
for all n > 0, and so the above proof yields that if m < n, then (am,%n) = 
(m,2) = 2. Therefore, taking a, = x,/2 for every n we deduce that (@m,@n) = 
(m/2,%n/2) = (m/2,1) = 1. This same idea works for every orbit under this 
map: We get an infinite sequence of pairwise coprime integers by dividing through 
by 1 or 2, depending on whether zo is odd or even. 


Exercise 5.26.1. Perform a similar analysis of the map « > x? — 2 beginning by studying the 
orbit of 0. (The orbit of 4 under this map is shown, in the Lucas-Lehmer test (Corollary (10.10. 1), 
to provide an efficient way to test whether a given Mersenne number is prime.) 


However, things can be more complicated: Consider the orbit of x) = 3 under 
the map « > x? — 6x — 1. We have 


3 — —10 > 159 — 24326 — 591608319 + ---. 


Here x, is divisible by 3 if n is even, and it is divisible by 2 if n is odd. If we let 
Qn = L,/3 when n is even, and a, = x,/2 when n is odd, then one can show the 


210 Appendix 5H. Dynamical systems and infinitely many primes 


terms of the resulting sequence, 
1—> —5 > 53 > 12163 > 197202773 > --- , 


are indeed pairwise coprime. 
Another surprising example is given by the orbit of 6 under the map x > 
7+2°(a—1)(x — 7). Reducing the elements of the orbit mod 7 we find that 
6357453>52>51>50>0---:-:- (mod 7), 


as x°(a—1)(x—7) = x®(x—1) (mod 7), which is = x—1 (mod 7) if x #0 (mod 7) 
by Fermat’s little theorem. So zx, is divisible by 7 for every n > 6, but for no 
smaller n, and to obtain the pairwise coprime a, we let a, = %, for n < 5 and 
Gn = L,/7 once n > 6. 


It starts to look as though it might become complicated to formulate how to 
define a sequence (a,,),>0 of pairwise coprime integers in general; certainly a case- 
by-case description is unappealing. However, there is a simpler way to obtain the 
Gn: In these last two examples we have ay = %7/(%n,6) and then an = &p/(Xn, 7), 
respectively, for all n > 0, a description that will generalize well. 


5.27. Dynamical systems and the infinitude of primes 


One models evolution by determining the future development of the object of study 
from its current state (more sophisticated models incorporate the possibility of 
random mutations). This gives rise to dynamical systems, a rich and bountiful area 
of study. One simple model is that the state of the object at time n is denoted 
by x,, and given an initial state zp, one can find subsequent states via a map 
Ln > f(@n) = Ln41 for some given function f(.). Orbits of linear polynomials 
f(.) are easy to understand 4] but quadratic polynomials can give rise to evolution 
that is very far from what one might naively guess (the reader might look into the 
extraordinary Mandelbrot set). 


This is the set-up above, where f(a) is a polynomial with integer coefficients 
and 2p an integer. It will be useful to use dynamical systems terminology. 

If f"(a) = a for some integer n > 1, then (the orbit of) a is periodic, and the 
smallest such n is the exact period length for a. The orbit begins with the cycle 


wT Dssiiae (a) 
of distinct values and repeats itself, so that f"(a) = a, f"*1(a) = f(a),..., and, 
in general f"+*(a) = f*(q) for all k > 0. 

The number a is preperiodic if f(a) is periodic for some m > 0 and is strictly 
preperiodic if a is preperiodic but not itself periodic. In all of our examples so far, 0 
has been strictly preperiodic. In fact, if any two elements of the orbit of a are equal, 
say f™*"(a) = f™(a), then f**"(a) = f-™(f™*"(a)) = fP-™(F™(a)) = FR(a) 
for all k > m, so that a is preperiodic. 

Finally, a has a wandering orbit if it is not preperiodic, that is, if its orbit never 
repeats itself so that the {f™(a)}m>o are all distinct. Therefore, we wish to start 
only with integers x9 that have wandering orbits. 


at — 


34One can verify: If f(t) = at +b, then z, = x79 + nb ifa=1, and z, =a”"xp + -—1b ifaZ7l. 


5.28. Polynomial maps for which 0 is strictly preperiodic 211 


Our general result for constructing infinitely many primes from orbits of a 
polynomial map is 


Theorem 5.9. Suppose that f(x) € Z[x] and that 0 is a strictly preperiodic point 
of the map x > f(x). Let €(f) = lem[f(0), f2(0)]. For any integer xo that has a 
wandering orbit, (%n)n>0, let 

a 


~ ged (an, &(f)) 


Then the (Gn)n>0 are an infinite sequence of pairwise coprime integers. 


An forall n>0. 


One can prove that |a,| > 1 for all n > 3 (see [I]), and so all such a, have a 
private prime factor[5] The example f(x) =3-2(x — 3)? with 0 > 3 > 3 has a 
wandering orbit 2 > 1 — -1—.-.--, so that if v9 = 2, then 2; = 1 and x2 = —1. 
Therefore we cannot in general improve the lower bound, n > 3. 

We have ¢(f) = 1, 2, 6, and 7, respectively, for the four polynomials f(x) of 
section [5.26] of appendix 5H. 

In exercise[5.28.2] we will determine which polynomials f satisfy the hypothesis 
of Theorem [5.9] that is, those f for which 0 is a strictly preperiodic point. 


Proof. Suppose that k = n—m > 0. Then, by Corollary 2.3.1] 
In = i" (tn) = f*(0) (mod xm), 
and so gcd(am,2n) divides gcd(am, f*(0)), which divides f*(0). But this divides 
L(f) = lem[f*(0) : > 1, 

which is the lem of a finite number of nonzero integers, as 0 is preperiodic. Therefore 
({m,4n) divides L(f), and so (am, Lp) divides both (a, L(f)) and (a, L(f)). This 
implies that A», := @m/(@m, L(f)) divides am /(@m,@n) and Ay, divides tp, /(%m, Zn). 
But @m/(@m, Fn) and Lp,/(%m, Ln) are pairwise coprime which therefore implies that 
(Am, An) = 1. 

We will show in exercise [5.28.4(b) that L(f) = ¢(f); that is, the lem of all the 


elements of the orbit of 0 is the same as the lcm of the first two terms. This then 
implies that a, = A, for all n. 


5.28. Polynomial maps for which 0 is strictly preperiodic 


We have already seen the examples x? — 2 +1 for which 0 + 1 > 1 and x? —62—1 
for which 0 > —1 > 6 > —1, where 0 is preperiodic with period length one and 
two, respectively. In fact exact periods cannot be any larger: 


Lemma 5.28.1. Let f(x) € Z[a]. If the orbit of ap is periodic, then its exact period 


length is either one or two. 


Proof. Let N be the exact period length so that ay = ao, and then ayy, = 
f*(an) = f*(ao) = ax for all k > 1. Now assume that N > 1 so that a, 4 ao. 
Corollary implies that an41 — Gy divides f(@n41) — f(@n) = Qn42 — An41 for 


355 is a private prime factor of ay if p divides an but no other am. 


212 Appendix 5H. Dynamical systems and infinitely many primes 


all n > 0. Therefore, 


@, — ao divides a2 — a1, which divides a3 — az,..., which divides 
an — GN-1 = 40 — GN-1; and this divides a, —an = ai — ao, 


the nonzero number we started with. Therefore each aj, — a; = +(a, — ag) by 
exercise[L.1I{b). This implies that there must be some j > 1 for which a;,;—a,; = 
—(a; — aj—1), or else each aj;+1 — a; = a1 — ao and so 


N-1 N-1 
0 = an — a9 = >> (a;41 — 5) = D> (a1 — a0) = N(a1 — a0) #0, 
j=0 j=0 


a contradiction. Therefore aj; = a;—-1, and we deduce that 


a= aye f aaa PG) San = Ge, 


as desired. 


One can show that there are just four possibilities for the period length and 
the preperiod length of the orbit of 0 (yielding the polynomials which satisfy the 
hypothesis of Theorem|5.9). We now give examples of each type. For some non-zero 
integer a, the orbit of 0 is one of the following: 


0>-a>a>-:-| forx>a?-ar+a (e.g., (dn)n>o and (Fr)n>0); oF 
0>-a7>a>a7--:| fora=1or 2, withe > 22?-lorg?-2, 
respectively; or 
0>-1l>a7>-1-.-:-| with f(z) =2? —az—-1; or 
031525-1525-1-5.---] with f(z7)=14+2+2?—-23; and 


all other possible orbits are obtained from these, by using the observation that if 
07> a—>b-.--- is an orbit for « > f(x), then 0 > —a > —b > --- is an orbit 
for « + g(x) where g(x) = —f(—2). 


Exercise 5.28.1. Suppose that f(x) has an orbit 79 > 21 > ---. Let g(x) = f(x+a)—a. Prove 
that g(x) has an orbit 79 -a> 41 -a>a2g-a-7--- 


Exercise 5.28.2. Find every f(x) € Z[z] with each of the four orbits above. (As an example, 
fo() = a gives 0 + a > a, so f(x) is another with this orbit if and only if 0 and a are roots of 


f(x) — fo(x); that is, f(x) — fo(x) = x(x — a)g(x) for some g(x) € Z[z].) 
Exercise 5.28.3.! Prove that the four orbits above are the only possible ones. 


Exercise 5.28.4. Let f(x) € Z[a] and deduce from our classification of possible orbits: 
(a) 0 is strictly preperiodic if and only if f?(0) = f4(0) 40; 
(b) L(f) = lem[f(0), f2(0)] =: (f) (as claimed in the proof of Theorem 5.9); 
(c) xo has a wandering orbit if and only if x2 4 v4. 


Exercise 5.28.5. Suppose that uo € C has period p under the map x > f(x) where f(x) € Z[z], 


so that u is a root of the polynomial f?(x) — a. Prove that if f is monic, then rT is a unit for 
alO<i<j<p-l. 


We have been iterating the map n — f(n) where f(t) € Zt]. If one allows 
f(t) € QIt], then it is an open question as to the possible period lengths in the 
integers. Even the simplest case, f(x) = 2? +c, with c € Q, is not only open 


5.28. Polynomial maps for which 0 is strictly preperiodic 213 


but leads to the magnificent world of dynamical systems (see [2]). We would like 
to know what primes divide the numerators when iterating, starting from a given 
integer. 


References for this chapter 


[1] Andrew Granville, Using dynamical systems to construct infinitely many primes, Amer. Math. 
Monthly 125 (2018), 483-496 (preprint). 


[2] J. H. Silverman, The arithmetic of dynamical systems, Graduate Texts in Mathematics 241, 
Springer, New York, 2007. 


Chapter 6 


Diophantine problems 


Diophantine equations are polynomial equations in which we study the integer or 
rational solutions. They are named after Diophantus (who lived in Alexandria in 
the third century A.D.) who wrote up his understanding of such equations in his 
thirteen volume Arithmetica (though only six part-volumes survive today). This 
work was largely forgotten until interest was revived by Bachet’s 1621 translation 
of Arithmetica into Latin} 


6.1. The Pythagorean equation 


Right-angled triangles with sides 3, 4,5 and 5, 12, 13, etc, were known to the ancient 
Babylonians. We wish to determine all right-angled triangles with integer sides, 
which amounts to finding all solutions in positive integers x, y, z to the Pythagorean 
equation 

ety a2. 
Note that z >2z,y >0as a, y, and z are all positive. We can reduce the problem, 
without loss of generality, so as to work with some convenient assumptions: 


e That x, y, and z are pairwise coprime, by dividing through by their ged, as 
in exercise [1.7.8 


e That x is even and y is odd, and therefore that z is odd: First note that x 
and y cannot both be even, since xz, y, and z are pairwise coprime; nor both 
odd, by exercise [2.5.6{b). Hence one of x and y is even, the other odd, and 
we interchange them, if necessary, to ensure that x is even and y is odd. 


Under these assumptions we reorganize the equation and factor to get 


(g-yety)=2-y =a’. 


1 Translations of various ancient Greek texts into Latin helped inspire the Renaissance. 


215 


216 6. Diophantine problems 


We now prove that (z — y,z + y) = 2: We observe that (z — y,z + y) divides 
(z+ y) —(z-—y) = 2y and (z + y) + (z — y) = 22, and that (2y,2z) = 2(y, z) = 2. 
Therefore (z — y, z+ y) divides 2, and so equals 2 as z— y and z+ y are both even. 

Therefore, since (z — y)(z + y) = a? and (z— y,z+y) = 2, there exist integers 
r,s such that 


z—y=2s* and z+y=2r?; or z—y=—2s? and z+y=-—2r’, 


by exercise c). The second case is impossible since r?, y, and z are all positive. 
From the first case we deduce that 


ge=2rs, y=r?—s?, and z=r*4+8?. 


To ensure that x, y, and z are pairwise coprime we need (r,s) = 1 and r+ s odd. 
If we now multiply back in any common factors, we get the general solution 


(6.1.1) g=2grs, y=ag(r? — 5s”), and z= g(r? +8”). 


If we want an actual triangle, then the side lengths should all be positive so we 
may assume that_g > 0 and r > s > 0, as well as (r,s) = 1 and r and s having 
different parities The reader should verify that the integers x, y, and z given by 
this parametrization always satisfy the Pythagorean equation. 


ar 43") 


2grs 


Figure 6.1. Parameterization of all integer-sided right-angled triangles. 


One can also give a nice geometric proof of the parametrization in (6.1.1). We 
start with a reformulation of the question. 


Exercise 6.1.1. Prove that the integer solutions to x? + y? = z? with z > 0 and (a, y,z) = 1 are 
in 1-to-1 correspondence with the rational solutions u,v to u? + v? = 1. 


Where else does a line going through (1,0) intersect the circle x? + y? = 1? 
Unless the line is vertical it will hit the unit circle in exactly one other point, which 
we will denote by (u,v). Note that u < 1. If the line has slope t, then t = v/(u—1) 
is rational if uw and v are. 


?That is, one is even, the other is odd. 


6.1. The Pythagorean equation 217 


Figure 6.2. A line through (1,0) on the circle x? + y? = 1. 


In the other direction, the line through (1,0) of slope t is y = t(a — 1) which 
intersects x2 + y? = 1 where 1 — 2? = y? = t?(x — 1)”, so that either z = 1 and 
y = 0, or we have 1+ 2 = t?(1—~2), which yields the point (u,v) with 

al —2t 

u = >— and v= ——. 

an el ee 
These are both rational if t is. We have therefore proved that u,v € Q if and only 
if t € Q. In other words the line of slope t through (1,0) hits the unit circle again 
at another rational point if and only if t is rational, and then we can classify those 


points in terms of t. Therefore, writing t = —r/s where (r,s) = 1, we have 
r? — 8? ri 2rs 
= >—, and v= =—s, 
r2 + s2 r2 + 52 


the same parametrization to the Pythagorean equation as in (6.1.1) when we clear 
out denominators. 


Exercise 6.1.2.1 Find a formula for all the rational points on the curve #? — y? = 3. 


Exercise 6.1.3. We call {a,b,c} a primitive Pythagorean triple if a, b, and c are pairwise coprime 
integers for which a? + b? = c?. 

(a) Prove that, in a primitive Pythagorean triple, the difference in length between the hy- 
potenuse and each of the other sides is either a square or twice a square. 

(b) Can one find primitive Pythagorean triples in which the hypotenuse is three units longer 
than one of the other sides? Either give an example or prove that it is impossible. 

(c)' One can find primitive Pythagorean triples in which the hypotenuse is one unit longer 
than one of the other sides, e.g., {3,4,5}, {5,12,13}, {7,24,25}, {9,40,41}, {11,60, 61}. 
Parametrize all such solutions. 

(d)t One can find primitive Pythagorean triples in which the hypotenuse is two units longer than 
one of the other sides, e.g., {3,4,5}, {8,15,17}, {12,35,37}, {16, 63,65}, {20,99, 101}. 
Parametrize all such solutions. 


218 6. Diophantine problems 


Exercise 6.1.4. (a) Prove that the side lengths of a primitive Pythagorean triple are # 2 
(mod 4). 
(b) Given integer n > 1 with n 4 2 (mod 4), explicitly give a primitive Pythagorean triple 
which has n as a side length. 


Exercise 6.1.5.1 Prove that there are infinitely many triples of coprime squares in arithmetic 
progressions. 


Around 1637, Pierre de Fermat was studying the proof of (6.1.1) in his copy of 
Bachet’s translation of Diophantus’s Arithmetica. In the margin he wrote: 


I have discovered a truly marvellous proof that it is impossible to separate a 
cube into two cubes, or a fourth power into two fourth powers, or in general, 
any power higher than the second into two like powers. This margin is too 
narrow to contain it. 


—PIERRE DE FERMAT (1637), in his copy of Arithmetica 


In other words, Fermat claimed that for every integer n > 3 there do not exist 
positive integers x,y, z for which 


This is known as “Fermat’s Last Theorem”. Fermat did not subsequently mention 
this problem or his truly marvellous proof elsewhere, and the proof has not, to 
date, been rediscovered, despite many efforts|}] Fermat did show that there are no 
solutions when n = 4 and we will present his proof in section [6.4] as well as some 
consequences for more general exponents n in Fermat’s Last Theorem. 


6.2. No solutions to a Diophantine equation through descent 


Some Diophantine equations can be shown to have no solutions by starting with a 
purported smallest solution and finding an even smaller one, thereby establishing 
a contradiction. Such a proof by descent can be achieved in various different ways. 


No solutions through prime divisibility 


For some equations one can perform descent by considering the divisibility of the 
variables by various primes. We now give such a proof that \/2 is irrational. 


Proof of Proposition [3.4.1] by 2-divisibility. [\/2 is irrational.] Let us recall 
that if 2 is rational, then we can write it as a/b so that a? = 2b7. Let us suppose 
that (b,a) gives the smallest solution to y? = 2x? in positive integers. Now 2 
divides 2b? = a? so that 2|a. Writing a = 2A, thus b? = 2A?, and so 2|b. Writing 
b = 2B we obtain a solution A? = 2B? where A and B are half the size of a and b, 
contradicting the assumption that (b, a) is minimal. 


Exercise 6.2.1. Show that there are no non-zero integer solutions to x? + 3y3 + 923 = 0. 


3Fermat wrote several important thoughts about number theory on his personal copy of Arithmetica, 
without proof. When he died his son, Samuel, made these available by republishing Arithmetica with 
his father’s annotations. This is the last of those claims to have been fully understood. 


No solutions through geometric descent 219 


No solutions through geometric descent 


Proof of Proposition B.4.1] by geometric descent. Again assume that 2 = 
a/b with a and b positive integers, where a is minimal. Hence a? = 2b? which gives 
rise to the smallest isosceles, right-angled triangle, OPQ with integer side lengths 
OP = 0Q =b, PQ =a and angles POQ = 90°, PQO = QPO = 45°. Now mark 
a point R which is 6 units along PQ from Q and then drop a perpendicular to meet 
OP at the point S so that SR is perpendicular to PQ. Then RPS = QPO = 45°, 
and so RSP = 180°—90°—45° = 45° by considering the angles in the triangle RSP. 
Therefore RSP is a smaller isosceles, right-angled triangle than OPQ. Moreover 
we have side lengths RS = PR =a — b. To establish our contradiction we need to 
show that the hypoteneuse, PS, also has integer length. 


Figure 6.3. No solutions through geometric descent. 


The two triangles, OQS and RQS, are congruent, since they both contain a 
right-angle opposite SQ and adjacent to a side of length b (OQ and RQ, respec- 
tively). Therefore OS = SR = a—b and so PS = OP—OS =b-— (a—b) = 2b-a. 
Hence RSP is a smaller isosceles, right-angled triangle than OPQ with integer side 
lengths, contradicting the assumed minimality of OPQ. 


One can write this proof more algebraically: As a? = 2b?, soa > b > a/2. Now 
(2b —a)* = a? — 4ab + 2b? + 2b? = a* —4ab + 2b? +a? = 2(a — 5). 
However 0 < 2b—a < a, contradicting the minimality of a. 
Proof of Proposition by an analogous descent. [If d is an integer for 


which Vd is rational, then Vd is an integer.] If Vd is rational, then we can write 
it as a/b so that a? = db?. Let us suppose that (b,a) gives the smallest solution 


220 6. Diophantine problems 


to y? = dx? in positive integers. Let r be the smallest integer > db/a, so that 
wb +1>r> o and therefore a > ra — db > 0. Then 


(ra — db)? = da® — 2rdab + d?b? + (r? — d)a? 
= da® — 2rdab + d?b? + (r? — d)db” = d(rb — a)?. 


However 0 < ra — db < a, contradicting the minimality of a, unless ra — db = 0. In 
this case r? = d- db? /a? = d. 


6.3. Fermat’s “infinite descent” 


Fermat proved that there are no right-angled triangles with all integer sides whose 
area is a square (see exercise [6.3.1] below). In so doing he developed the important 
technique of “infinite descent” , which we now exhibit in two related questions. (The 
reader can read the proof of only one of the two following similar theorems. They 
both lead to the same Corollary [6.4.1] ) 


Theorem 6.1. There are no solutions in non-zero integers x,y,z to 


g*+y* = 2. 


Proof. Assume that there is a solution and let x,y,z be the solution in positive 
integers with z minimal. We may assume that gcd(a,y) = 1 or else we can divide 
the equation through by the fourth power of gcd(a, y) to obtain a smaller solution. 
Here we have 

(22)? + (y2)? = 2? with ged(2?,y) = 1, 
and so, by (6.1.1), there exist integers r,s with (r,s) = 1 and r +s odd such that 


g?=2rs, yr=r*—s*, and z=r?+38? 


(swapping the roles of z and y if necessary to ensure that x is even). Now r 
and s have the same sign since rs = «7/2, so we may assume they are both > 0 
(multiplying each by —1 if necessary). Now s? + y? =r? with y odd and (r,s) =1 
and so, by (6.1.1), there exist integers a,b with (a,b) = 1 and a+ odd such that 


s=2ab, y=a*?-b*, and r=a’?+}*, 


and so 

a? = 2rs = 4ab(a? + 0”). 
Now a and b have the same sign since ab = s/2 > 0, and therefore we may assume 
they are both > 0 (multiplying each by —1 if necessary). 


Now a, b, and a? + b? are pairwise coprime positive integers whose product is 
a square so they must each be squares by exercise b). Write a = u?, b= v?, 
and a? + b? = w? for some positive integers u,v, w. Therefore 


w+vt=a’?+0 =w’? 


yields another solution to the original equation. We wish to compare this to the 


6.4. Fermat’s Last Theorem 221 


solution (a, y, 2) we started with. We find that 


w<wa@4+Par<r?4+s? =z, 


contradicting the minimality of z. 


Theorem 6.2. There are no solutions in positive integers x,y,z to 

at —y* = 27. 
Proof. If there is a solution, take the one with x minimal. We may assume (2, y) = 
1 or else we divide through by the fourth power of the common factor. 


We begin by noting that 
(y?)? +2? = (a)? with ged(x”,y”) = 1. 
If z is even, then, by (6.1.1), there exist integers X,Y with (X,Y) = 1, of opposite 
parity, for which 


a? = X?74Y7 and y27=X?-Y?, sothat X+—Y* = (zy). 
Now X? < 2?, contradicting the minimality of x. 


Therefore z is odd. By (6.1.1) there exist integers r, s with (r,s) = 1, of opposite 


parity, for which 

xg? =r7+57 and y* = 2rs. 
Now r and s have the same sign since rs = y?/2 > 0, and therefore we may 
assume they are both > 0 (multiplying each by —1 if necessary). From the equation 


2rs = y” we deduce that r = 2R?,s = Z? for some integers R, Z (swapping the 
roles of r and s, if necessary). From (6.1.1) applied to the equation r? + s? = 2”, 
there exist integers u,v with (u,v) = 1, of opposite parity, for which r = 2uv and 
s = u2—v?. Now uv = r/2 = R?, so we may assume they are both positive 
(multiplying each by —1 if necessary), and so u = m?,v = n? for some integers 


m,n. Therefore 


mont a=aw-v=sH=2Z?. 


Now m? < (mn)? = uv = r/2 < x/2, contradicting the minimality of x. 


Exercise 6.3.1 (Fermat, 1659). 
(a)? Prove that there is no right-angled, integer-sided, triangle whose area is a square. 


(b) Deduce that there is no right-angled, rational-sided, triangle whose area is 1. 


(c) Deduce that there are no integer solutions to 2+ + 4y+ = z?. 


In appendix 6B we will see an alternative proof of these results using classical Greek geometry. 


6.4. Fermat’s Last Theorem 


Fermat’s Last Theorem is the assertion that for every integer n > 3 there do not 
exist positive integers x,y, z for which 


nm 


ey ty” = 2”. 


Corollary 6.4.1 (Fermat). There are no solutions in non-zero integers x,y,z to 


z* +y4 = z*. 


Exercise 6.4.1. Prove this using Theorem or Theorem 


222 6. Diophantine problems 


We deduce that Fermat’s Last Theorem holds for all exponents n > 3 if it holds 
for all odd prime exponents: 


Proposition 6.4.1. If Fermat’s Last Theorem is false, then there exists an odd 
prime p and pairwise coprime non-zero integers x,y,z such that 


ay ap a = 0, 


Proof. Suppose that «”+y" = 2” with x,y,z >0andn > 3. If two of x, y, and z 
have a common factor, then it must divide the third and so we can divide out the 
common factor. Hence we may assume that x,y,z are pairwise coprime positive 
integers. Now any integer n > 3 has a factor m which is either = 4 or is an odd 
prime (see exercise [3.1.3(b)). Hence, if n = dm, then (x%)™ + (y2)™ = (24)™, so 
we get a solution to Fermat’s Last Theorem with exponent m. We can rule out 
m = 4by Corollary|6.4.1] Therefore m = p is an odd prime and we have the desired 
solution (x7)? + (y4)? + (—z4)P =0. 


A brief history of equation solving 


There have been many attempts to prove Fermat’s Last Theorem, inspiring the 
development of much great mathematics, for example, ideal theory (see appendices 
3D and 12B). We will discuss one beautiful advance due to Sophie Germain from 
the beginning of the 19th century (see section [7.27] of appendix 7F). 


In 1994 Andrew Wiles proved Fermat’s Last Theorem, developing ideas of Frey, 
Ribet, and Serre involving modular forms, a subject far removed from the original 
question. The proof is extraordinarily deep, involving some of the most profound 
themes in arithmetic geometry/4| If the whole proof were written in the leisurely 
style of, say, this book, it would probably take a couple of thousand pages. This 
could not be the proof that Fermat believed that he had—could Fermat have been 
correct? Could there be a short, elementary, marvelous proof still waiting to be 
found? Or will Fermat’s claim always remain a mystery? 


To some extent one can measure the difficulty of solving Diophantine equations 
(especially rational solutions to equations with two variables) by their degree|}| The 
first three chapters of this book focus on linear (degree-one) equations, culminating 
in section [3.6] Much of the rest of this book provides tools for studying degree-two 
(quadratic) equations; see chapters 8 and 9, sections 11.2 and 11.3, and chapter 
12. Degree-three (cubic) equations give rise to elliptic curves; many of the key 
questions about elliptic curves lay shrouded in mystery and so they are intensively 
researched in number theory today (see chapter 17). In 1983 Gerd Faltings showed 
that higher-degree Diophantine equations only have finitely many rational solutions 
(though not how to find those solutions). 

For higher-degree equations perhaps the most interesting cases are Diophantine 
equations with varying degree, like the Fermat equation. Another famous example 
is Catalan’s conjecture: The positive integer powers are 


1,4, 8, 9, 16, 25, 27, 32, 36, 49, 64,... 
4See our sequel for some discussion of the ideas involved in the proof. 


5A better but more sophisticated invariant is the genus, which requires quite a bit of algebraic 
geometry to define and is beyond the scope of this book. 


A brief history of equation solving 223 


which seem to get wider spread out as they get larger. Only two of the numbers in 
our list, 8 and 9, differ by 1, and Catalan conjectured that this is the only example 
of powers differing by 1. That is, the only integer solution to 


x? —y? =1 with x,y 40 and p,q > 2, 
is 3? — 2? = 1. This was shown to be true by Preda Mihailescu in 2002. 


Combining these two famous equations leads to the Fermat-Catalan equation 


x? +y? = 2" where (x,y,z) = 1 and : + : + : <1. 
Pq 
We insist that (x,y,z) = 1 because one can find “trivial” solutions like 2* + 2* = 
2**+1 in many cases (see exercise for more examples). Obviously there are 
solutions when one of p,q,r is 1, so we insist they are all > 2. One can find 
solutions when two of the exponents equal 2, and so the peculiar looking condition 
+ : + + < 1 turns out to be the correct one. We do know of ten solutions: 


Tle 


1422 =37, 247 =34, 74137 =29, 2417 =71?, 3° +4114 = 1297, 
17’ + 76271° = 210639287, 1414° + 2213459? = 65’, 9262? + 153122837 = 113”, 


43° + 96222? = 300429077, 33° + 1549034? = 15613. 


It is conjectured that there are only finitely many solutions z?, y7, z” to the Fermat- 
Catalan equation; perhaps these ten are all the solutions. All of our ten solutions 
have an exponent equal to 2. So one might further conjecture that there are no 
solutions to the Fermat-Catalan equation with p,q,r all > 2. These are open 
questions and mathematicians are making headway. Henri Darmon and I proved in 
1995 that there are only finitely many solutions for each fized triple p, q,r. Today we 
know that for various infinite families exponent triples p,q,r, the Fermat-Catalan 
equation has no solutions: For example when p = q and . + : + 4 < 1 there are 
no solutions if r is divisible by 2 or by 3 or by p, or if p is even and r is divisible by 
5, etc. (see [1] for the state of the art). 


Now that Fermat’s Last Theorem has been proved, what can take its place 
as the “holy grail” of Diophantine equations? The abc-conjecture is clearly an 
important problem that would have profound effects on equations and even in other 
areas of number theory. In appendix 6A we will discuss its analogy for polynomials 
and then discuss the abc-conjecture itself and its influence on other equations, in 


section [11.5] 


References for this chapter 


[1] Michael Bennett, Imin Chen, Sander Dahmen, Soroosh Yazdani, Generalized Fermat equations: A 
miscellany, Int. J. Number Theory 11 (2015), 1-28. 


{2] John J. Watkins, chapter 5 of Number theory, a historical approach, Princeton University Press, 
2014. 


Additional exercises 


Exercise 6.5.1. Find all rational-sided right-angled triangles in which the area equals the perime- 
ter. Prove that 5,12,13 and 6,8, 10 are the only such integer-sided triangles. 


224 6. Diophantine problems 


Exercise 6.5.2.1 Let n be an integer > 2 that is # 2 (mod 4). Prove that there are 2#(™—1 dis- 
tinct primitive Pythagorean triangles in which n is the length of a side which is not the hypotenuse, 
where w(n) counts the number of distinct prime factors of n. 


Exercise 6.5.3.1 Find a 1-to-1 correspondence between pairs of integers b,c > 0 for which 
x? — br —c and x? — br + ¢ are both factorable over Z, and right-angled triangles in which all 
three sides are integers. 


Exercise 6.5.4. Prove that if f(x) € Z[a] is a quadratic polynomial for which f(x) and f(x) +1 
both have integer roots, then f(a) + 1 is the square of a linear polynomial. (Try substituting the 
roots of f(x) into f(x) +1 and studying divisibilities of the differences of the roots.) 


Exercise 6.5.5.1 We wish to show that a = ee. is irrational. Suppose it is rational, so that 
a = p/q with (p,q) = 1. Now a satisfies the equation #? = 2 + 1, so dividing through by x we 
have x = (1+ 2)/z, and so a = (p+ q)/p. Prove that p/q cannot equal (p + q)/p and therefore 
establish a contradiction. 


Exercise 6.5.6. Generalize the proof in the last exercise, to prove that if a is a rational root of 
x? — ax —b € Z[z], then a is an integer which divides b. 


Exercise 6.5.7. Prove that 2n is the length of the perimeter of a right-angled integer-sided 
triangle if and only if there exist divisors di, d2 of n for which di < dz < 2di. 


Exercise 6.5.8. Suppose that integers p, q,r are given. For any integers a and b let c = aP +b9. If 
we multiply this through by c”, where n is divisible by p and q, then (ac”/?)? + (be"/4)4 = c?+1, 
Determine conditions on p, qg, and r under which we find an integer n such that c+! is an rth 
power (and therefore find an integer solution to x? + y? = 2", albeit with (x,y,z) > 1). 


Exercise 6.5.9. Calculations show that every integer in [129,300] is the sum of distinct squares. 
Deduce that every integer > 128 is the sum of distinct squares. (In exercise [2.5.6[f) we showed 
that there are infinitely many integers that cannot be written as the sum of three squares. In 
appendix 12E we will show that every integer is the sum of four squares.) 


Exercise 6.5.10. Prove that there are infinitely many integers that cannot be written as the sum 
of three cubes. 


Exercise 6.5.11.! Calculations show that every integer in [12759, 30000] is the sum of distinct 
cubes of positive integers. Deduce that every integer > 12758 is the sum of distinct cubes of 
positive integers. (In 2015 Siksek showed that every integer > 454 is the sum of at most seven 
positive cubes. It is believed, but not proven, that every sufficiently large integer is the sum of at 
most four positive cubes.) 


Exercise 6.5.12. Verify the identity 62 = (a + 1)? + (w— 1)? — 2x3. Deduce that every prime is 
the sum of no more than five cubes of integers (which can be positive or negative). 


Exercise 6.5.13. (a) Prove that n+ = 0 or 1 (mod 16) for all integers n. 
Let N be divisible by 16. 
(b) Show that if N is the sum of 15 fourth powers, then each of those fourth powers is even. 
(c) Deduce that N is the sum of 15 fourth powers if and only if N/16 is the sum of 15 fourth 
powers. 
(d) Prove that 31 is not the sum of 15 fourth powers but is the sum of 16 fourth powers. 
(e) Deduce that there are infinitely positive integers N that are not the sum of 15 fourth powers. 


(In 2005, Deshouillers, Kawada, and Wooley showed that every integer > 13792 can be 
written as the sum of 16 fourth powers.) 


In 1770 Waring asked whether for all integers k there exists an integer g(k) 
such that every positive integer is the sum of at most g(k) kth powers of positive 
integers. This was proved by Hilbert in 1909 but it is still a challenge to evaluate 
the smallest possible g(&) for each k. We discuss this further in appendix 17D. 


Appendix 6A. Polynomial 
solutions of Diophantine 
equations 


6.6. Fermat’s Last Theorem in C{t] 


The notation C[{¢] denotes polynomials whose coefficients are complex numbers. In 
section[6.1] we saw that all integer solutions to x? + y? = 2? are derived from letting 
t be a rational number in the polynomial solution 


(? — 1)? + (28)? = (+1). 


We now prove that there are no “genuine” polynomial solutions to Fermat’s equa- 
tion 


(6.6.1) ey? = 2P 


with exponent p larger than 2 (where by genuine we mean that (x(t), y(t), z(t)) is 
not a polynomial multiple of a solution of (66.1) in complex numbers). 


Proposition 6.6.1. There are no genuine polynomial solutions x(t), y(t), z(t) € 
C[t] to x(t)? + y(t)? = z(t)? with p > 3. 


Proof. Assume that there is a solution with x, y, and z all non-zero to (6.6.1) 
where p > 3. We may assume that x, y, and z have no common (polynomial) 
factor or else we can divide out by that factor (and that they are pairwise coprime 
by the same argument as in section [6.1). Our first step will be to differentiate 
to get 


paP— hg! 4 py? ty! =< p2P-1z! 


and after dividing out the common factor p, this leaves us with 


(6.6.2) ge lg! +yP ty! = P12. 


225 


226 Appendix 6A. Polynomial solutions of Diophantine equations 


We now have two linear equations (6.6.1) and (6.6.2) (thinking of 2?~1, y?~1, and 
zP—! as our variables), which suggests we use linear algebra to eliminate a variable: 


Multiply (66.1) by y’ and (6.6.2) by y, and subtract, to get 

a?" (ay! — ya") = a (ay! — ya’) +?" (yy! — yy’) = PN (zy! — yz’). 
Therefore x?~! divides 2?~!(zy' — yz’), but since x and z have no common factors, 
this implies that 
(6.6.3) a?! divides zy! — yz’. 
This is a little surprising, for if zy’ — yz’ is non-zero, then a high power of x divides 
zy’ — yz’, something that does not seem consistent with (6.6.1). 

Now, if zy’ — yz’ = 0, then (y/z)’ = 0 and so y is a constant multiple of z, 
contradicting our statement that y and z have no common factor. Therefore (6.6.3) 
implies, taking degrees of both sides, that 

(p—1) degree(x) < degree(zy’ — yz’) < degree(y) + degree(z) — 1, 
since degree(y’) = degree(y) — 1 and degree(z’) = degree(z) — 1. Adding degree(z) 
to both sides gives 
(6.6.4) p degree(x) < degree(x) + degree(y) + degree(z). 


The right side of (6.6.4) is symmetric in x, y, and z. The left side is a function of 
x simply because of the order in which we chose to do things above. We could just 
as easily have derived the same statement with y or z in place of x on the left side 


of (6.6.4), so that 
p degree(y) < degree(x) + degree(y) + degree(z) 
and p degree(z) < degree(x) + degree(y) + degree(z). 
Adding these last three equations together and then dividing out by degree(a) + 
degree(y) + degree(z) implies 
p<3, 
and so Fermat’s Last Theorem is proved, at least for polynomials. 


That Fermat’s Last Theorem is not difficult to prove for polynomials is an old 
result, going back certainly as far as Liouville in 1851. 


Exercise 6.6.1. Prove that all solutions to x(t)? + y(t)? = z(t)? in polynomials are a scalar 
multiple of some solution of the form (r(t)? — s(t)?)? + (2r(t)s(t))? = (r(t)? + s(t)?)?. 


6.7. a+b=c in Cle] 


We now intend to extend the idea in our proof of Fermat’s Last Theorem for 
polynomials to as wide a range of questions as possible. It takes a certain genius 
to generalize to something far simpler than the original. But what could possibly 
be more simply stated, yet more general, than Fermat’s Last Theorem? It was 
Richard C. Mason (1983) who gave us that insight: Look for solutions to 


at+b=c. 


We will just follow through the above proof of Fermat’s Last Theorem for polyno- 
mials (Proposition [6.6.1) and see where it leads: Start by assuming, with no loss 


6.7.a+b=c in C{t] 227 


of generality, that a, b, and c are all non-zero polynomials without common factors 
(or else all three share the common factor and we can divide it out). Then we 
differentiate to get 
a+b=c. 

Next we need to do linear algebra. It is not quite so obvious how to proceed 
analogously, but what we do learn in a linear algebra course is to put our coefficients 
in a matrix and solutions follow if the determinant is non-zero. This suggests 
defining 


_ fat) v0) 
AO = lat) mer 


Then if we add the first column to the second, we get 


a(t) c(t) 
A(t) = a'(t) c(t) ) 
and similarly 
NREECOMETO) 


by adding the second column to the first, a beautiful symmetry. 

We note that A(t) 4 0, or else ab’ — a'b = 0 so b is a scalar multiple of a (with 
the same argument as above), contradicting our hypothesis. 

To find the appropriate analogy to (6.6.3), we consider the power to which 
the factors of a (as well as b and c) divide our determinant: Let a be a root of 
a(t), and suppose that (t — a)® is the highest power of (t — a) which divides a(t) 
(we write (t — a)°||a(t)). Now we can write a(t) = U(t)(t — a)® where U(t) is a 
polynomial that is not divisible by (t — a), so that a’(t) = (t — a)*-!V(t) where 
V(t) :-= U'(t)(t — a) + eU(t). Now (t —a,V(t)) = (t-—a,eU(t)) = 1, and so 
(t —a)*~1||a’(t). Therefore 

A(t) = a(t)b'(t) — a’ (t)b(t) = (t— a)*' W(t) 


where W(t) := U(t)(t — a)b' (t) — V(#)b(t) and (t— a, W(t)) = (t—a, V(t)d(t)) = 1 
as t — a does not divide b(t) or V(t). Therefore we have proved that 
(t— a)" |A(é). 
This implies that (t—a)*° divides A(t)(t—a@). Multiplying all such (t— a)° together 
we obtain (since they are pairwise coprime) that 
a(t) divides A(t) [J (t- a). 
a(a)=0 


In fact a(t) only appears on the left side of this equation because we studied the 
linear factors of a; analogous statements for b(t) and c(t) are also true, and since 
a(t), b(t), c(t) have no common roots, we can combine those statements to read 


(6.7.1) a(t)b(t)e(t) divides A(t) J] (t-a). 
(abc) (a)=0 


The next step is to take the degrees of both sides of (6.7.1). The degree of 
TT abey(a)=0(t — @) is precisely the total number of distinct roots of a(t)b(t)c(t). 


228 Appendix 6A. Polynomial solutions of Diophantine equations 


Therefore 
degree(a) + degree(b) + degree(c) < degree(A) + #{a € C: (abc)(a) = 0}. 
Now, using the three different representations of A above, we have 
degree(a) + degree(b) — 1, 
degree(A) < 4 degree(a) + degree(c) — 1, 
degree(c) + degree(b) — 1. 
Inserting all this into the previous inequality we get 
degree(a), degree(b), degree(c) < #{a € C: (abc)(a) = 0}. 


Put another way, this result can be read as: 


Theorem 6.3 (The abc Theorem for Polynomials). [f a(t), b(t), c(t) € C[t] do not 
have any common roots and provide a genuine polynomial solution to a(t)+b(t) = 
c(t), then the maximum of the degrees of a(t), b(t), c(t) is less than the number of 
distinct roots of a(t)b(t)c(t) = 0. 


This is a “best possible” result in that we can find infinitely many examples 
where there is exactly one more zero of a(t)b(t)c(t) = 0 than the largest of the 
degrees, for example the familiar identity 


(2t)? + (t? —1)? = (#? +1)?; 
or the rather less interesting 
t?+1=(t" +1). 


Exercise 6.7.1. Let a, b, and c be given non-zero integers, and suppose n,p,q,r > 1. 

(a) Prove that there are no genuine polynomial solutions x(t), y(t), z(t) to aw” + by” = cz” 
with n> 3. 

(b) Prove that if there is a genuine polynomial solution x(t), y(t), z(t) to ax? + by? = cz" in 
which x, y, and z have no common root, then ‘ + ‘ + + >1. 

(c) Deduce in (b) that this implies that at least one of p, g, and r must equal 2. 

(d) One can find solutions in (b) if one allows common factors, for example x* + y? = z4 where 
x = t(t? +1) and y= z=t9 +1. Generalize this construction to as many other sets of 
exponents p, g, Tr as you can. (Try to go beyond the construction in exercise [6.5.8]) 


Exercise 6.7.2. Let a and 6 be given non-zero integers, p,q > 1, and x(t), y(t) € C[t]. Let D be 
the maximum of the degrees of x? and y?, and assume that ax? + by? 4 0. 
(a) Prove that the degree of ax? + by? is > D(1 — ; = mee 
(b)? Prove that if g = (p,q) > 1, then the degree of ax? + by? is > D/g. 
(c) Deduce that the degree of ax? + by? is always > D/6. 
(This is “best possible” in the case (¢? + 2)3 — (t3 + 3t)? = 3t? +8.) 


Appendix 6B. No 
Pythagorean triangle 
of square area 

via Euclidean geometry 


In this appendix we use Euclidean geometry to show that there is no integer-sided 
right-angled triangle whose area is a square|’| rather than the algebraic methods 
of section [6.3] In this proof one sees algebra and geometry working together, fore- 
shadowing a theme one frequently encounters as one studies advanced mathematics. 


An algebraic proof, by descent 


We will suppose that there are integer-sided right-angled triangles with square area 
and establish a contradiction. We take the integer-sided right-angled triangle ABC 
whose area is a square with smallest hypotenuse. By (6.1.1) its sides have lengths 


AB=2MN, AC=N?-—M?, and BC =M?+N’, 

where M and N are coprime positive integers of different parities. The area of this 
triangle, MN(N — M)(N + M), is a square by hypothesis. We now prove that the 
factors M, N, N — M, and N+ M are pairwise coprime: 

We have (M,N + M) = (N+M,M) = (N,M) = 1. Finally let g = 
(N — M,N+M). Then g is odd since N — M is odd. Moreover g divides both 
(N+ M) —-(N-—M) = 2M and (N+ M)+ (N — M) = 2N, so that g divides 
(2M,2N) = 2(M, N) = 2. The only odd positive integer g dividing 2 is g = 1. 

We have proved that the product MN(N — M)(N + M) is a square and that 
the factors M, N, N — M, and N+ M are pairwise coprime. Since they are all 


®This proof is due to a student, Stephanie Chan, working with me in London in 2017. 


229 


230 Appendix 6B. No square area Pythagorean triangle via Euclidean geometry 


positive integers, this implies that each of M, N, N— M, and N+ M has to bea 
square. We write 


M=m?, N=n?, N-M=p*, and N+M=¢ 


for integers m,n, p,q, where p = q (mod 2). 
Now we take the triangle UVW with integer side lengths 


vwait? and UW=n. 


This is a right- angled triangle since (45%)? + (42)? = aan = N =n’, and it has 
area 5: GP. GP = ze =a = “ = (#)?, which is a square. However its hypotenuse, 
n, is < N< M 24N 2 , which contradicts the minimality of ABC. 


Remarkably, Stephanie Chan showed that a scalar multiple of the triangle 
UVW could be constructed using a ruler and compass: 


6.8. A geometric viewpoint 


We set T = M/N, so that T, 1-T = (N— M)/N, and1+T=(N+4+M)/N are 
all squares. We now construct an integer-sided right-angled triangle A’ B’C’ from 
ABC whose area is a square, such that when we divide the sides by a common 
factor, we will obtain UVW. 


Let 26 be the angle BCA, so that tan28 = AB = aoe and one can deduce 
tan8 = T from the usual double-angle formulas. Draw a circle centred at C 
through B, and then extend the line AC to a point C’ where it meets the circle. 


The geometry implies that the angle BC’ A is £6. 


Figure 6.4. Halving the angle 26 of the initial triangle ABC. 


Now rotate the line AC’ about A by drawing a circle centered at A through C’. 
Draw a line parallel to AC passing through B, which meets the circle in two points; 
let B’ denote the intersection of this line and circle which is furthest from C’. 


6.8. A geometric viewpoint 231 


Figure 6.5. Obtaining a new point B’ at the intersection of the line and the circle. 


Extending the line AC’ we can drop a perpendicular from B’ to this line and denote 
the intersection by A’. Our new triangle is A’B’C’. Let a denote the angle B’C’A’. 


Figure 6.6. Forming the new triangle ABC. 


Now AB’ = AC’ as they are both arcs of the same circle, and AB = A’B’ by 
construction. Therefore 


. A'B’ AB 
sin 2a AB’ AC!’ tan B T, 


and so cos2a = /1—T*?. Since 1 — T and 1 — T are both squares, therefore 
V1+T > V1—T > 0 are both rational numbers. Moreover 


(V1+7T4+V1—-T)? =24+2V1—T? =2+2cos2a = (2cosa)”. 


Taking square roots, we deduce that cosa = $(V1 +T+/1-—T), as they are both 
positive, and so cosa is a rational number. Similarly sina = $(V1 +T—VJ/1-T) 
is also a rational number. 
Now A’B’ = AB is an integer by the hypothesis. Therefore 
/pi 


pio 28 


and then A’C’ = B'C’-cosa 


are both rational; that is, A’ B’C” is rational-sided. The area of this new triangle is 


; -A'B’. A'C! = 5(BIC' sin a)(B’C’ cosa) = {Bey sin 2a = 7 (BIC'y, 


232 Appendix 6B. No square area Pythagorean triangle via Euclidean geometry 


which is a square, as JT’ is a square. The side lengths of our new triangle are 


(6.8.1) A'B'=2MN, A'C'’=2N(N+./N?- M?), and 
(6.8.2) BIC! = 2N(\/N(N +M)+/N(N — M)). 


At the beginning of this appendix we had M = m?, N = n?, N—M =p’, and 
N+M = @ for integers m,n, p,q, and so 
AB =n*(q?—p*), A'C’=n*(ptq)?, and B'C" = 2n*(pt+4q). 
If we divide each side length through by 2n?(q +p), then we obtain the triangle 
UVW from the previous section and establish the required contradiction. 


Exercise 6.8.1. Suppose that ABC is an integer-sided right-angled triangle. Insert a circle inside 
the triangle which is tangent to each of the sides of the triangle. Prove that the circle’s radius is 
an integer. 


A b B 


Figure 6.7. A circle inscribed inside a right-angled triangle 


Appendix 6C. Can a binomial 
coefficient be a square? 


Can ea with n > k be a square? We will show that if so, then k = 0, 1, 2, or 3. 


6.9. Small & 


Evidently (5) = 1 is always a square. Moreover Cee) is a square if and only if 


n =m? —1 for some integer m > 1. 


Lemma 6.9.1. The positive integers n for which ee) is a square are given exactly 
by the integer solutions to 


x — 2Qy? =+1. 
When the sign is —1 letn = x27 —1, and when the sign is +1 let n = x? —2 so that 
(a) = (xy)? in either case. 


The integer solutions to this equation, the Pell equation, will be discussed at 
length in section [LL.2 


Proof. If (77) = w?, then (n+ 2)(n +1) = 2w?. Now (n+2,n+1) =1 80 either 
n+2=27 andn+1=2y’, orn+2 = 2y? andn+1 = 2’, yielding our result. 


One finds an infinite set of solutions by induction: We begin with 4) =f, 


which corresponds to 32 — 2-2? = 1 above. Now given any solution to (2) =m? 
we obtain a larger solution 
2N —1)? N 
(‘ ‘ ) ) = (4N —2)?. e = ((4N — 2)m)?. 
We perform this operation repeatedly to find an infinite set of solutions. 
For k = 3 we write t =n+2 € Z so that if 2) = m?, then 
t? —t = 6m’, 


233 


234 Appendix 6C. Can a binomial coefficient be a square? 


giving the right-angled triangle with side lengths (2¢/m, (t? — 1)/m, (t? + 1)/m) of 
area 6. We will determine the infinitely many rational-sided right-angled triangles 
of area 6 in appendix 17B, obtaining the integral solution () = 140? (as well as 
(3) and ($)) to our problem, but no others. 


6.10. Larger & 


For larger k we can prove there are no solutions using our understanding of prime 
divisors of binomial coefficients: 


Theorem 6.4. There are no integers n > k > 4 for which Ca) is the square of 
an integer. 


Proof. We write each n + j = ajm? with each a; squarefree. If ("{") = m?, then 
(6.10.1) a1 +++ aR (my+++my)? = klm?. 


We first show that if p is a prime dividing a;, then p < k: Otherwise p > k, and 
so the power of p dividing k!m? is the same as the power of p dividing m?, which 
is even. Therefore the power of p dividing a) ---a, is even by (6.10.1), and so p 
divides an even number of the a;, since each a; is squarefree. Since p divides a;, this 
implies that p divides a; for some j # i. Therefore p divides (n+ j)—(n+1) = j-i 
and so p < |j —i| < k, a contradiction. 

By the theorem of Sylvester and Schur (Theorem[5.7) we know that some n+ j 
has a prime factor p > k and so p divides m;. Thereforen+k >n+j = ajms > 
m; > p* > (k+1)?; that is,n > kh? +k+1. 

Let g = (a1---ax,k!). Dividing g out from both sides of (6.10.1), we deduce 
that a1---ax/g is a square. Now the power of prime p dividing a,---a, is < 
#{j: p|n+ j} as each a, is squarefree, and this equals [24] — [3]. The power of 
p dividing k! is > [§]. Therefore the power of p dividing a, ---a;/g is 


=F 1-Gl-lels* 


But this must be even as a1---az/g is a square, and so equals 0. Therefore 
ay-+-ax/g =1; that is, g = a, ---+ a,x divides kl. 


If the a; are distinct positive integers, then the smallest must be > 1, the next 
smallest > 2, etc. Therefore k! < a,---a, < k!, and so the numbers aj,..., ax 
must be the numbers 1,2,..., arranged in some order. Therefore if k > 4, then 
some a; equals 4, contradicting that the a; are squarefree. 


Therefore two a; are the same, say a; = a; where 2 < 7. Then 
k>j-t=(n+j)—(n+i) = a;(m4 — mj) > a;((m; +1)? — m?) 


> 2aymjy > 2Vn+i>2Vk2+k 


which is impossible. 


Exercise 6.10.1.! Show that ‘ee cannot be an th power for any £ > 3. 


Chapter 7 


Power residues 


We begin by calculating the least residues of the small powers of each given residue 
mod m, to look for interesting patterns: 


; 5 a a) a | a | a) ae 
av lala 
1/0|/0/0)]0] 0 
0} 0 
1}i) il} iy iyi 
1} 1 
1}2;)}1),2)] 1) 2 
Least power residues (mod 2). Least power residues (mod 3)H 


In these small examples, the columns soon settle into repeating patterns as we go 


from left to right: For example, in the mod 3 case, the columns alternate between 
0,1,1 and 0,1,2. How about for slightly larger moduli? 
lala la latlab a®° | ala? |a? | a* | a 
1/0;0/0]0/)] 0 
hel cea Wa ha vfafafafada 
1/2}; 4]/34]1)] 2 
1}2;0]0) 0/0 
1}3)} 1 ]3) 17) 3 oa hod bale oa 
1/4; 1]/4)]1/4 
Least power residues (mod 4). Least power residues (mod 5). 


Why did we take 0° to be 1 (mod m) for m = 2, 3, 4, and 5? In mathematics we create symbols 
and protocols (like taking powers) to represent numbers and actions on those numbers, and then we 
need to be able to interpret all combinations of those symbols and protocols. Occasionally some of those 
combinations do not have an immediate interpretation, for example 0°. So how do we deal with this? 
Usually mathematicians develop a convenient interpretation that allows that not-well-defined use of a 
protocol to nonetheless be consistent with the many appropriate uses of the protocol. Therefore, for 
example, we let 0° be 1, because it is true that a° = 1 for every non-zero number a, so it makes sense 
(and is often convenient) to define this to also be so for a = 0. 

Perhaps the best known dilemma of this sort comes in asking whether oo is a number. The correct 
answer is “No, it is a symbol” (representing an upper bound on the set of real numbers) but it is certainly 
convenient to treat it as a number in many situations. 


235 


236 7. Power residues 


Again the patterns repeat, every second power mod 4, and every fourth power mod 
5. Our goal in this chapter is to understand the power residues, and in particular 
when we get these repeated patterns. 


7.1. Generating the multiplicative group of residues 


We begin by verifying that for each coprime pair of integers a and m, the power 
residues do repeat periodically: 


Lemma 7.1.1. For any integer a, with (a,m) = 1, there exists an integer k, 
1<k< (m), for which a®¥ =1 (mod m). 


Proof. Each term of the sequence 1,a,a?,a?,... is coprime with m by exercise 
[3.3.51 But then each is congruent to some element from any given reduced set of 
residues mod m (which has size ¢(m)). Therefore, by the pigeonhole principle, 
there exist i and j with 0 <i <j < ¢(m) for which a’ = a (mod m). 

Next we divide both sides of this equation by a’. To justify doing this we 
observe that (a’,m) = 1 (as (a,m) = 1) and so we can use Corollary [3.5.1]to obtain 
our result with k = j —i, so that 1<k < d(m). 


Exercise 7.1.1. (a) Show that for any integers a and m > 2, there exist integers 7 and k, with 
0<i<m-—land1<k<m-—isuch that a”+* =a” (mod m) for every n> i. 
(b) For each integer m > 2 determine an integer a such that a # 1 (mod m) but a? = a 
(mod m). (This explains why we need the hypothesis that (a,m) = 1 in Lemma/?.1.1]) 


Another proof of Corollary [3.5.2] [Jf (a,m) = 1, then a has an inverse mod 


m.] Let r=a*—! so that ar = a* =1 (mod m). 


Examples. In the geometric progression 2,4, 8,..., the first term = 1 (mod 13) is 
212 — 4096. The first term = 1 (mod 23) is 21! = 2048. Similarly 5° = 15625 = 1 
(mod 7) but 5° = 1 (mod 11). We see that in some cases the power needed is as 
big as ¢(p) = p — 1, the bound given by Lemma[Z.I.1] but not always. 

If a* =1 (mod m), then a*+4 = a/ (mod m) for all j > 0, and so the geometric 
progression a°, a, a?,...modulo m has period k. Thus if u =v (mod k), then a” = 
a” (mod m). Therefore one can easily determine the residues of powers (mod m). 
For example, to compute 319°? (mod 13), first note that 3° = 1 (mod 13). Now 
1000 = 1 (mod 3), and so 310° = 3! = 3 (mod 18). 


If (a,m) = 1, then let ord,,(a), the order of a (mod m), denote the smallest 
positive integer k for which a* = 1 (mod m). We know that there must be such an 
integer, by Lemma[¥.1.1] We have ord3(2) = ord4(3) = 2, ords(2) = ords(3) = 4 
(from the tables above), and ord 3(2) = 12, ordg3(2) = 11, ord7(5) = 6, and 
ord;1(5) = 5 from the examples above. The powers of 3 (mod 16) are 1,3,9,3° = 
11,34 = 1,3,9,11,1,3,9,11,1,... so that the residues are periodic with period 
ordi (3) =A; 


Lemma 7.1.2. Suppose that a and m are coprime integers with m > 1. Then n is 
an integer for which a” = 1 (mod m) if and only if ordm(a) divides n. 


7.2. Fermat’s Little Theorem 237 


Proof. Let k := ord,,(a) so that a* = 1 (mod m). Suppose that n is an integer 
for which a” = 1 (mod m). There exist integers gq and r such that n = gk+r where 
0<r<k-1. Hence a” = a"/(a*)4 =1/12=1 (mod m). Therefore r = 0 by the 
minimality of & (from the definition of order), and so k divides n as claimed. 


In the other direction, if k divides n, then a” = (a*)"/* = 1 (mod m). 


Exercise 7.1.2. Let k := ordm(a) where (a,m) = 1. 
(a) Show that 1,a,a?,...,a*—1 are distinct (mod m). 
(b) Deduce that aJ = a* (mod m) if and only if 7 =i (mod k). 


We see that ordm(a) is the smallest period of the sequence 1,a,a?,... (mod m). 


We wish to understand the possible values of ord»,(a), especially for fixed m, 
as a varies over integers coprime to m. We begin by taking m = p prime. The 
theory for composite m can be deduced from an understanding of the prime power 
modulus case, using the Chinese Remainder Theorem as determined in detail in 
section 7.18 of appendix 7B. 


Theorem 7.1. [fp is a prime and p does not divide a, then ord,(a) divides p—1. 


Proof. Let k := ord,(a) and A = {1,a,a?,...,a*~! (mod p)}. For any non-zero b 
(mod p) define the set bA = {ba (mod p): a € A}. 

Let b and b’ be any two reduced residues mod p. We now show that either bA 
and b'A are disjoint or they are equal: If they have an element, c, in common, then 
there exists 0 < i,j <k—1 such that ba’ = c= U/a! (mod p). Therefore b/ = ba” 
(mod p) where h is the least non-negative residue of i — 7 (mod k). Hence 


Wat = ba?*® (mod p) if0<@<k-1-h, 
i — 
ba?t#-k (mod p) ifk-h<l<k-1, 


which implies that b’A C bA. Since the two sets are finite and of the same size they 
must be identical. 


Since any two sets of the form bA are either identical or disjoint, we deduce 
that they partition the non-zero elements mod p. That is, the reduced residues 
1,...,p—1 (mod p) may be partitioned into disjoint cosets bA, of A, each of which 
has size |A|; and therefore |A| = k divides p — 1. 


To highlight this proof let a = 5 and p= 13 so that A = {1, 5, 5? = 12, 5° 
8 (mod 13)}. Then the cosets A, 2A = {2, 10, 11, 3 (mod 13)}, and 4A 
{4, 7, 9, 6 (mod 13)} partition the reduced residues mod 13, and therefore 3|A| = 
12. Also note that 7A = {7, 9, 6, 4 (mod 13)} = 4A, as claimed, the same residues 
but in a rotated order. 


7.2. Fermat’s Little Theorem 


Theorem [7.1] limits the possible values of ord,(a). The beauty of the proof of 
Theorem [7.1] which is taken from Gauss’s Disquisitiones Arithmeticae, is that. it 
works in any finite group, as we will see in Proposition[7.22.1]of appendix 7DE| This 


2 What is especially remarkable is that Gauss produced this surprising proof before anyone had 
thought up the abstract notion of a group! 


238 7. Power residues 


result leads us directly to one of the great results of elementary number theory, first 
observed by Fermat in a letter to Frénicle on October 18, 1640: 


Theorem 7.2 (Fermat’s “Little” Theorem). If p is a prime and a is an integer 
that is not divisible by p, then 


p divides a?~'—1. 


Proof. We know that ord,(a) divides p—1 by Theorem[Z.]] and therefore a?~' = 1 
(mod p) by Lemma[?.1.2] 


Here is a useful reformulation of Fermat’s “Little” Theorem: 
Fermat’s Little Theorem, v2. [fp is a prime and a is a positive integer, then 
p divides a? —a. 
Exercise 7.2.1. Prove that our two versions of Fermat’s Little Theorem are equivalent to each 


other (that is, easily imply one another). 


We now present several different proofs of Fermat’s “Little” Theorem and then 
a surprising proof in appendix 7A. 


“Sets of reduced residues” proof. In exercise B.5.2] we proved that {a - 1, 
a-2,...,a-(p—1)} form a reduced set of residues mod p. The residues of these 
integers mod p are therefore the same as the residues of {1,2,...,p — 1} although 
in a different order. Since the two sets are the same mod p, the products of the 
elements of each set are equal mod p, and so 


(a-1)(a-2)---(a-(p—1))=1-2---(p—1) (mod p); 
that is, 
a?-.(p—1)!=(p—1)! (mod p). 


As (p, (p — 1)!) = 1, we can divide the (p — 1)! out from both sides to obtain the 
desired 


a?-'=1 (mod p). 


Euler’s 1741 proof. We shall show that a? — a is divisible by p for every integer 
a> 1. We proceed by induction on a: For a = 1 we have 1?~! — 1 = 0, and so the 
result is trivial. Otherwise, by the binomial theorem, 


(as vp—ar-1= 5 ("=o (mod p), 


i=1 


as p divides the numerator but not the denominator of es ) for each 1,1 <i<p-1 
(as in exercise 2.5.8). Reorganizing we obtain 


(a +1)? — (a+1) = (a? +1) —- (a+ 1) =a”-a=0 (mod p), 


the last congruence following from the induction hypothesis. 


7.2. Fermat’s Little Theorem 239 


Combinatorial proof. The numerator, but not the denominator, of the multi- 
nomial coefficient ¢ ik 7 is divisible by p unless one of 7, 7,k,... equals p and the 
others equal 0. In this case the multinomial coefficient equals 1. Therefore, by the 


multinomial theorem} 


(atb+c+---)P=a? +b? +c? +--+ (mod p). 


I 


Taking a=b=c=.---=1 gives (? = é (mod p) for all integers @ > 1. 


Another proof of Theorem Theorem[7.1]follows from Fermat’s Little The- 
orem and Lemma[Z.1.2] with m = p and n = p—1. (This is not a circular argument 
as our last three proofs of Fermat’s Little Theorem do not use Theorem [?.1}) 


We can use Fermat’s Little Theorem to help quickly determine large pow- 
ers in modular arithmetic. For example for 2'0°°°°! (mod 31), we have 29° = 1 
(mod 31) by Fermat’s Little Theorem, and so, as 1000001 = 11 (mod 30), we 
obtain 21000001 = 911 (mod 31) and it remains to do the final calculation. How- 
ever, using the order makes this calculation significantly easier: Since ord3;(2) = 5 
we have 2° = 1 (mod 31) and therefore, as 1000001 = 1 (mod 5), we obtain 
21000001 = 21 = 2 (mod 31). 


It is worth stating the converse to Fermat’s Little Theorem: 


Corollary 7.2.1. If (a,n) =1 anda"! #1 (mod n), then n is composite. 


For example (2,15) = 1 and 24 = 16 = 1 (mod 15) so that 244 = 2? =4 
(mod 15). Hence 15 is a composite number. The surprise here is that we have 
proved that 15 is composite without having to factor 15. Indeed whenever Corollary 
[7.2.]]is applicable we will not have to factor n to show that it is composite. This is 
important because we do not know a fast way to factor an arbitrarily large integer 
n, but one can compute rapidly with Corollary (as discussed in section 7.13 
of appendix 7A). We will discuss such compositeness tests in section 


Exercise 7.2.2. Prove that for any m > 1 if (a,m) = 1, then ord,,(a) divides ¢(m) (by an 
analogous proof to that of Theorem [7.1). 


Theorem 7.3 (Euler’s Theorem). For any m > 1 if (a,m) = 1, then a®™ =1 


(mod m). 


Proof. By definition a4‘*) = 1 (mod m). By exercise there exists an 
integer k for which ¢(m) = k ordm(a) and so a%(™ = (a24m(%)* = 1 (mod m). 


This result and proof generalizes even further, to any finite group, as we will 
see in Corollary [7.23.1] of appendix 7D. 


Exercise 7.2.3. Prove Euler’s Theorem using the idea in the “sets of reduced residues” proof of 
Fermat’s Little Theorem, given above. 


Exercise 7.2.4. Determine the last decimal digit of 3°64. 


3For the reader who has seen it before. 


240 7. Power residues 


7.3. Special primes and orders 

We now look at prime divisors of the Mersenne and Fermat numbers using our 
results on orders. 

Exercise 7.3.1. Show that if p is prime and q is a prime dividing 2? — 1, then ordg(2) = p. 


Hence, by exercise [7.3.1] if g divides 2? — 1, then p = ordg(q) divides g — 1 by 
Theorem [7.1] that is, q=1 (mod p). 


Another proof that there are infinitely many primes. If p is the largest 
prime, let q be a prime factor of 2? — 1. We have just seen that p divides g — 1, so 
that p< q—1 <q. This contradicts the assumption that p is the largest prime. 


Exercise 7.3.2.1 Show that if prime p divides F, = 2?” +1, then ordp(2) = 2"+1. Deduce that 
p=1 (mod 2”+1), 


Theorem 7.4. Fir k > 2. There are infinitely many primes = 1 (mod 2*). 


Proof. If p, is a prime factor of F, = 2?” +1, then p, = 1 (mod 2*) for all 
n >k-—1, by exercise [7.3.2] We saw that the p,, are all distinct in section 5.1] 


7.4. Further observations 


Lemma a weak form of the Fundamental Theorem of Algebra (Theorem[3.11), 
states that any polynomial in C[z] of degree d has at most d roots. An analogous 
result can be proved for polynomials mod p. 


Proposition 7.4.1 (Lagrange). Suppose that p is a prime and that f(x) is a non- 
zero polynomial with coefficients in Z/pZ of degree d. Then f(x) has no more than 
d roots mod p (counted with multiplicity). 


Proof. By induction on d > 0. This is trivial for d = 0. For d > 1 we will suppose 
that f(a) =0 mod p. Then write f(a) = 0, fia’ and define 


d i 
g(a) = FIL) _ ye = 
i=0 


d 
7 
= ae tar’? +---+a'1), 
a 
i=0 


a polynomial of degree d — 1 with leading coefficient fa (so is non-zero). Therefore 
g(x) has no more than d — 1 roots mod p, by the induction hypothesis. Now 

f(@) = f(@) — f(@) = (@ — a)g(x) 
and so if f(b) = 0 (mod p), then (b — a)g(b) = 0 (mod p). Either b = a (mod p) 
or g(b) = 0 (mod p), and so f has no more than 1 + (d— 1) = d roots mod p. 


Fermat’s Little Theorem implies that 1,2,3,...,p—1 are p—1 distinct roots of 
x?~! —1 (mod p), and are therefore all the roots, by Proposition Therefore 
the polynomials «?~' — 1 and (x — 1)(a — 2)---(x — (p—1)) mod p are the same 
up to a multiplicative constant. Since they are both monic they must be identical; 
that is, 


(7.4.1) a?) —1= (4 —1)(a—2)---(x—(p—1)) (mod p), 


7.5. The number of elements of a given order, and primitive roots 241 


which implies that 
x? —y% =2(x—1)(a—2)---(w—(p—1)) (mod p). 


Theorem 7.5 (Wilson’s Theorem). For any prime p we have 


(p—1)!=-1 (mod p). 


Proof. Take x = 0 in (7.4.1), and note that (—1)?~! = 1 (mod p). 


Gauss’s proof of Wilson’s Theorem. Let S' be the set of pairs (a,b) for which 
1<a<b< pand ab = 1 (mod p); that is, every residue is paired up with its 
inverse unless it equals its inverse. Now if a= a! (mod p), then a? = 1 (mod p), 
in which case a = 1 or p— 1 (mod p) by Lemma[B.8.1] Therefore 


1-2---(p—l1)=1-( - {I ab=1- - {I 1=-1 (mod p). 


(a,b)ES (a,b)ES 


Example. For p = 13 we have 
12! = 12(2 x 7)(3 x 9)(4 x 10)(5 x 8)(6 x 11) =—-1-1-1-1-1-1=-—1 (mod 13). 
Exercise 7.4.1. (a) Show that if n > 4 is composite, then n divides (n — 1)!. 


(b) Show that n > 2 is prime if and only if n divides (n — 1)!4+1. 


Combining Wilson’s Theorem with the last exercise we have an indirect pri- 
mality test for integers n > 2: Compute (n — 1)! (mod n). If it is = —1 (mod n), 
then n is prime; if it is = 0 (mod n), then n is composite. Note however that in 
determining (n — 1)! we need to do n— 2 multiplications, so that this primality test 
takes far more steps than trial division (see section [5.2)! 


Exercise 7.4.2. (a) Use the idea in Gauss’s proof of Wilson’s Theorem to show that 


II a= II b (mod n). 
l<a<n 1<b<n 
(a,n)=1 b?=1 (mod n) 


(b) Evaluate this product using exercise [3.8.3] or by pairing b with n — b. 


Exercise 7.4.3. (a) Show that ( (—1)-D/? (mod p). 
bet) =1or—1 (mod p). 


(c) Deduce that if p= 1 (mod 4), then (25+)! is a root of «2 = —1 (mod pA 


(p— Pp) = 
(b) Deduce that if p= 3 (mod 4), then 


7.5. The number of elements of a given order, and primitive roots 


In Theorem [7.1] we saw that the order modulo p of any integer a which is coprime 
to p must be an integer which divides p— 1. In this section we show that for each 
divisor m of p—1, there are residue classes mod p of order m. 


4This explicitly provides a square root of —1 (mod p) which is interesting, as there is no easy way 
in general to determine square roots mod p. However we do not know how to rapidly calculate the least 


residue of (23): (mod p). 


242 


7. Power residues 


Example. For the primes p = 13 and p = 19 we have 


Order | a (mod 13) Order a (mod 19) 
1 iL 1 1 
2 12 2 18 
3 3, 9 3 7, 11 
4 5, 8 6 8, 12 
6 4, 10 9 4, 5, 6, 9, 16, 17 
if | 96,7, 11 18° | 2; 3, 10, 13, 14, 15 


How many residues are there of each order? From these examples we might guess 
the following result. 


Theorem 7.6. If m divides p—1, then there are exactly ¢(m) elements a (mod p) 
of order m. 


A primitive root a mod p is a reduced residue mod p of order p— 1. The least 
residues of the powers 


2 


1,a,a?, a®, ..., a?~* (mod p) 


are distinct reduced residues by exercise and so must equal 
1,2,...,p—1 


in some order. Therefore every reduced residue is congruent to some power a’ 
(mod p) of a, and the power 7 can be reduced mod p— 1. For example, 2, 3, 10, 
13, 14, and 15 are the primitive roots mod 19. We can verify that the powers of 3 
mod 19 yield a reduced set of residues: 


1.3 32 33 34 3° 36 37 38 39 310 gil 322 318 3i4 315 316 3l7 318 on8 
= 1,3, 9, 8, 5,—4, 7, 2, 6,-1,-3,-9,-8,—5, 4,-7,-2, —6,1,... (mod 19), 


respectively, so 3 is a primitive root mod p. Taking m = p— 1 in Theorem [7.6] we 
obtain the following: 


Corollary 7.5.1. For every prime p there exists a primitive root mod p. In fact 
there are d(p—1) distinct primitive roots mod p. 


To prove Theorem [7.6]it helps to first establish the following lemma: 


Lemma 7.5.1. If m divides p—1, then there are exactly m elements a (mod p) 
for which a™ = 1 (mod p). 


Proof. We saw in (4.1) that 
Pl ={s (2 _ ia” ite gP—1—-2m = am ae 1) 


factors into distinct linear factors mod p, and therefore 7” — 1 does so also. 


The residue a (mod p) is counted in Lemma if and only if the order of a 
divides m. Now we prove Theorem [7.6] which counts the number of residue classes 
a (mod p) whose order is exactly m. 


7.5. The number of elements of a given order, and primitive roots 243 


Proof of Theorem [7.6] Let 7(d) denote the number of elements a (mod p) of 
order d. The set of roots of 2™ — 1 (mod p) is precisely the union of the sets of 
residue classes mod p of order d, over each d dividing m, so that 


(7.5.1) So (a) =m 

d\|m 
for all positive integers m dividing p — 1, by Lemma [7.5.1] We now prove that 
w(m) = ¢(m) for all m dividing p— 1, by induction on m. The only element of 


order 1 is 1 (mod p), so that w(1) = 1 = ¢(1). For m > 1 we have 7(d) = ¢(d) for 
all d < m that divide m, by the induction hypothesis. Therefore 


v(m) =m— S° ¥(d) =m— S° 6d) = o(m), 
d\|m d\|m 
d<m d<m 


the last equality following from Proposition [4.1.1] The result follows. 


Although there are many primitive roots mod p (¢(p—1) of them by Theorem 
(7.6), it is not obvious how to always find one rapidly. In section [7.15] of appendix 
7B we will present Gauss’s practical algorithm for finding primitive roots (as well 


as special cases in exercises [8.9.20] [8.9.21] and [8.9.22). 


It is believed that 2 is a primitive root mod p for infinitely many primes p 
though this remains an open question. Artin’s primitive root conjecture states that 
every prime q is a primitive root mod p for infinitely many primes p. This is known 
to be true for all, but at most two, primes] Gauss himself conjectured that 10 is a 
primitive root mod p for infinitely many primes p and this is also an open question. 
Any integer m, which is neither a perfect square nor —1, is conjectured to be a 
primitive root mod p for infinitely many primes p. 


Corollary 7.5.2. For every prime p and every integer k, we have 


{° ifp—1tk 


42k 4-..+(p—1)* = 
(P- =) | if p — 1k 


(mod p). 


Proof. Let S, := 1* + 2% +---+ (p—1)*. If p—1 divides k, then each j* = 1 
(mod p) by Fermat’s Little Theorem and so S, =1+---+1=p-—1 (mod p), as 
claimed. So, henceforth assume that p — 1 does not divide k. 


Let a be a primitive root mod p, so that a* # 1 (mod p) since p— 1 does not 
divide k. The integers {a-1,a-2,...,a-(p—1)} form a reduced set of residues mod 
p and so are a rearrangement of the residues of {1,2,...,p—1} mod p. Therefore 
any symmetric function of these two sets of integers residues are congruent mod p 
(as we saw in the “Sets of reduced residues” proof of Fermat’s Little Theorem); in 


particular, 
p-l 
Sp = So (aj) =a*S; (mod p). 
j=l 
Therefore (a* —1)S; =0 (mod p) but a* £1 (mod p) and so S; =0 (mod p). 


5This result is strangely formulated because of the nature of what was proved (by Heath-Brown 
[2], improving a result of Gupta and Murty, see [8])—that in any set of three distinct primes qi, q2, 43, 
at least one is a primitive root mod p for infinitely many primes p. Therefore there cannot be three 
exceptions to the conjecture, and we believe that there are none. 


244 7. Power residues 


Near the beginning of this section we noted that if a is a primitive root (mod p), 
then every reduced residue is congruent to some power a) (mod p). This property 
is extremely useful for it allows us to treat multiplication as addition of exponents 
in the same way that the introduction of logarithms simplifies usual multiplication. 
We will discuss this further in section 7.16 of appendix 7B. 


Exercise 7.5.1. Write each reduced residue mod p as a power of the primitive root a, and use 
this to evaluate 1* +. 2*+---+(p—1)* (mod p) as a function of a and k. Use this to give another 
proof of Corollary [7.5.2 


Exercise 7.5.2. Let g be a primitive root modulo odd prime p. 
(a) Prove that g* = 1 (mod p) if and only if p — 1 divides a. 
(b) Show that g(?—1)/2 = —1 (mod p). 


In order to determine the order of an element mod n, one can use the following 
result: 


Proposition 7.5.1. Suppose that a and n are coprime integers. Then d is the 
order of a (mod n) if and only if af =1 (mod n) and a*/4 #1 (mod n) for every 
prime q dividing d. 


Proof. If d is the order of a (mod n), then a4 = 1 (mod n) and a*/4 #1 (mod n) 
by the definition of order, since d/q < d. 

On the other hand let m := ord,(a). By Lemma[7.1.2] we know that m divides 
d but does not divide d/q for any prime q dividing d. Therefore g does not divide 
d/m for any prime q dividing d, so there cannot be any primes gq that divide d/m. 
This implies that d/m = 1 and so ord,(a) =m = d. 


We deduce an important practical way to recognize primitive roots mod p: 


Corollary 7.5.3. Suppose that p is a prime that does not divide integer a. Then 
a is a primitive root (mod p) if and only if 


a®—)/1 41 (mod p) 


for all primes q dividing p— 1. 


Proof. By definition a is a primitive root (mod p) if and only if m := ord,(a) = 
p—1. The result follows from Proposition [7.5.1] 


Exercise 7.5.3. Find all residues of order 5 mod 31, given that 2° = 1 (mod 31). 


Exercise 7.5.4. (a) Prove that 2 is a primitive root (mod 13). 
(b) Use this to determine all of the other primitive roots (mod 13). 


Exercise 7.5.5. Let g be a primitive root modulo odd prime p. 
(a) Prove that if m divides p— 1, then g™ has order po 
(b)? Prove that g* (mod p) is a primitive root mod p if and only if (k,p — 1) = 1. 
(c) Deduce that there are ¢(p — 1) primitive roots mod p. 


7.6. Testing for composites, pseudoprimes, and Carmichael numbers 245 


7.6. Testing for composites, pseudoprimes, and Carmichael numbers 


In the converse to Fermat’s Little Theorem, Corollary we saw that if an 
integer n does not divide a"~' — 1 for some integer a coprime to n, then n is 
composite. For example, taking a = 2 we calculate that 


21009 — 562 (mod 1001), 


so we know that 1001 is composite. We might ask whether this always works. In 
other words: 
Is it true that ifn is composite, then n does not divide 2” — 2? 


For, if so, we have a very nice way to distinguish primes from composites. Unfor- 
tunately the answer is “no” since, for example, 


0 —=1 (mod 341), 
but 341 = 11 x 31. We call 341 a base-2 pseudoprime. Note though that 
3°49 = 56 (mod 341), 


and so the converse to Fermat’s Little Theorem, with a = 3, implies that 341 is 
composite. 

Are there composites n for which 2”~! = 3"-! = 1 (mod n)? Or 2" 1 = 
3-1 = 5"! = 1 (mod n)? Or, even Carmichael numbers, composite numbers 
that “masquerade” as primes in that a”~! = 1 (mod n) for all integers a coprime 
to n? A quick computer search finds the smallest example: 561 = 3-11-17. The 
next few Carmichael numbers are 1105 = 5-13-17, then 1729 = 7-13-19, ete. 


Exercise 7.6.1. Show that squarefree n is a Carmichael number if and only if n is composite 
and divides a” — a for all integers a. 


Carmichael numbers are a nuisance, masquerading as primes like this (and 
so preventing a quick and easy, surefire primality test). Calculations reveal that 
Carmichael numbers are rare, but in 1994 Alford, Pomerance, and I proved 
that there are infinitely many of them. Here is a more elegant way to recognize 
Carmichael numbers: 


Lemma 7.6.1. A positive integer n is a Carmichael number if and only if n is 
squarefree and composite and p—1 divides n—1 for every prime p dividing n. 


Proof. Suppose that n is squarefree and composite and p — 1 divides n — 1 for 
every prime p dividing n. If (a,n) = 1 and prime p divides n, then ord,(a) divides 
p—1 by Theorem [7.1] which divides n — 1, and so a”~! = 1 (mod p) by Lemma 
[7.1.2] Therefore a"~! = 1 (mod n) by the Chinese Remainder Theorem as n is 
squarefree, and so it is a Carmichael number. 


Now suppose that n is a Carmichael number. If prime p divides n, then a’~! = 


1 (mod p) for all integers a coprime to n. In particular, if a is a primitive root mod 
p, then p— 1 = ord,(a) divides n — 1 by Lemma[7.1.2] 

Now assume that p*||n with e > 2. We note that (1+ p)* =1+kp (mod p?) 
for all integers k > 1, by the binomial theorem, so that ord,2(1 +p) = p. Select 
a =1+>p (mod p*) with a = 1 (mod n/p*) so that (a,n) = 1. As pln we have 


246 7. Power residues 


1=(14+p)" =a" =a=1+>p (mod p”), a contradiction. Therefore n must be 
squarefree. 


Lemma [7.6.1]imples that 561 = 3-11-17 is a Carmichael number as 2,10, and 
16 divide 560. 


Exercise 7.6.2. Show that if n is a Carmichael number, then it is odd. 
Exercise 7.6.3.1 Show that if n is a Carmichael number, then it has at least three prime factors. 


Exercise 7.6.4. Prove that if 6m+1, 12m+1, and 18m +1 are all primes, then their product 
is a Carmichael number. (It is an open problem whether there exist infinitely many such prime 
triples, though it is not difficult to find examples, like 7 x 13 x 19 and 37 x 73 x 109.) 


7.7. Divisibility tests, again 


In section 2.4] we found simple tests for the divisibility of integers by 7, 9, 11, and 
13, promising to return to this theme later. The key to these earlier tests was 
that 10 = 1 (mod 9) and 10° = —1 (mod 7-11-18); that is, ordg(10) = 1 and 
ord7(10), ord,,(10), and ord;3(10) divide 6. For all primes p 4 2 or 5 we know that 
k := ord,(10) is an integer dividing p — 1. Hence if n = ~ nj 107, then 


= >, (s ramet’) (10 (10*)” = 5° (s: ramet! (mod p), 


m>0 m>0 

since if 7 = km +i, then 10/ = 10° (mod p). In the displayed equation we have 
cut up the integer n, written in decimal, into blocks of digits of length & and added 
these blocks together, which is clearly an efficient way to test for divisibility. The 
length of these blocks, k, is always < p—1 no matter what the size of n. Therefore 
we can, in practice, quickly test whether n is divisible by p, once we know the 
p-divisibility of every integer < 10* (< 10?7?). 

If k = 2¢@ is even, we can do a little better (as we did with p = 7, 11, and 13) 
since 10° = —1 (mod p), namely that 


d 
n= S/n; 10! = ye (x Nem+il0" — . Nkm+e+il0" ) (mod p), 
j=0 


m>0 


thus breaking n up into blocks of length @ = k/2. 


7.8. The decimal expansion of fractions 


The fraction $ = = .3333... is given by a recurring digit 3, so we write it as .3. More 
interesting i us are the: set of fractions 

1 2 3 

= = .142857, = = .285714, = = 428571 

7 ae se , 

: = .571428, 2 = .714285, : = .857142. 


These decimal expansions of the six fractions 7, 1 < a < 6, are each periodic of 
period length 6, and each contains the same six digits in the same order but starting 
at a different place. Starting with the period for 1/7 we find that we go through the 


fractions a/7 with a = 1,3,2,6,4,5 when we rotate the period one step at a time. 


7.8. The decimal expansion of fractions 247 


Do you recognize this sequence of numbers? These are the least positive residues 
of 10°, 101, 107, 103, 104, 10° (mod 7). To prove this, we begin by noting that since 
10° = 1 (mod 7), we have that 


10° —1 


= 142857 is an integer, 


which is < 6 digits long. Putting the 1/7 on the other side and dividing through 
by 10°, we obtain 


1 142857 107° 1 1 
= = .142857 + —- =. 
7 106 * 7 a 10° 7 
Substituting this expression in for the last term, divided by 10°, we obtain 
1 .142857 1 1 —— 
— = .142 (pee = 142 
7 857 + 108 27 857, 


the final equality by repeating this process infinitely often. Now if we multiply this 
through by 10, we obtain 
10 
y= 
and similarly, as 10? = 2 (mod 7), 


2 107 10? — 
== are | = .28574. 


1.428571, so that : = 


7 7 
We obtain all the other decimal expansions analogously. 


What happens when we multiply 1/7 through by 10"? For example, if k = 4, 


then ‘i 
a = 1428.571428 = 1498 +5. 


The part after the decimal point is always fore which equals £ where ¢ is the 
least positive residue of 10" (mod 7) (as in exercise b)). We can now give two 


results. 


Proposition 7.8.1. Suppose that m is an integer that is coprime to 10. If1 < 
a<m, then the decimal expansion of the period for a/m is periodic with period of 
length ord ,(10). This is the minimal period length if (a,m) = 1. 


Proof sketch. We proceed analogously to the above. Let n = ord,,(10) and 
r = (10" — 1)a/m, so that r is a positive integer < 10”. Let r be the sequence of 
digits that give the integer r. The same argument as above gives that 

a ce _ a a r r 1 a 

m 10" * 10" m tor’ 102" * 102" m 
On the other hand, if this equation holds and the decimal expansion has period n, 
then (10" — 1)a/m = (10" — 1).% = r.¥—.F =r. In other words, (10" — 1)a/m is 
the integer r, so that 10” = 1 (mod m) if (a,m) = 1. 


=...=fF, 


Exercise 7.8.1.1 Suppose that p is an odd prime for which 10 is a primitive root. Let az be the 
least residue of 10" (mod p), and suppose that az/p = .7R where 1 < rp, < 10?—!. Prove that rz 
is obtained from ri, by removing the leading k digits and concatenating them on to the end. 


Exercise 7.8.2. Prove that the decimal expansion of every rational number is eventually periodic. 
(One can see why we need “eventually” with the example an = .03333....) 


248 7. Power residues 


7.9. Primes in arithmetic progressions, revisited 


We can use the ideas in this chapter to prove that there are infinitely many primes 
in certain arithmetic progressions 1 (mod m). 


Theorem 7.7. There are infinitely many primes =1 (mod 3). 


Proof. Suppose there are only finitely many primes = 1 (mod 3), say p1, po,.--, Dr- 
Let a = 3p,po---p,, and let gq be a prime dividing a2 +a+1. Now q # 3 as 
a? +a+1=1 (mod 3). Moreover gq divides a? — 1 = (a — 1)(a2 +a +1), but 
not a—1 (or else 0= a? +a+1=1+1+1=83 (mod gq) but g 43). Therefore 
ord,(a) = 3 and so gq = 1 (mod 3) by Theorem Hence gq = p; for some j, so 
that q divides a as well as a +a+1, and thus q divides (a? +a+1)—a(a+1) =1, 
which is impossible. 


This, together with Theorem proves that there are infinitely many primes 
in both of the residue classes 1 (mod 3) and 2 (mod 3), as predicted from the data 
at the start of section 5.3] 


Exercise 7.9.1. Generalize this argument to primes that are 1 (mod 4), to primes that are 1 
(mod 5), and to primes that are 1 (mod 6). 


In order to generalize this argument to proving the existence of primes = 1 
(mod m) for every integer m > 3, including composite m, we need to replace the 
polynomial a? + a+ 1 by one that recognizes when a has order m. Evidently this 
must be a divisor of the polynomial a’ — 1; indeed a” — 1 divided through by 
all of the factors corresponding to orders which are proper divisors of m. This 
discussion leads us to define the cyclotomic polynomials ¢,(t) € Z|t], inductively, 
by the requirement 


(7.9.1) t™-1=[]¢a(t) for allm>1, 

d|m 
with each da(t) monic (see also appendix 4E). Therefore ¢)(t) =t — 1, 
éo(t) =t4+1, d3(t) =P? +£41, da(t) =241, 57) =4A +42 4441,.... 
Theorem 7.8. For any integer m > 2, there are infinitely many primes = 1 
(mod m). 


Proof. Suppose that pi,...,pz are all the primes that are = 1 (mod m) and let 
a= mp,--:-pr. Let q be a prime divisor of ¢,,(a), which divides a™ — 1, so that 
a™ = 1 (mod q). This implies that (¢q,a) divides (a” — 1,a) = 1 and so (q,a) = 1. 
In particular q is not a p; and does not divide m. 

Let d= ord,(a) so that g= 1 (mod d) by Theorem[7.]] Moreover d divides m 
as a™ = 1 (mod q). But q is not a p; and so g #1 (mod m), which implies that 
d#m, and therefore d < m. 


Now ¢ (x) divides et by definition. Substituting in x = a we deduce that 


q divides both 47=+ and a‘ — 1, so that 


m/d—-1 m/d-1 
a™ —1 


0= a S- (a¢)J = > 1l=m/d (mod q). 
j=0 


j=0 


7.9. Primes in arithmetic progressions, revisited 249 


This implies that q divides m/d, and therefore divides m, which contradicts what 
we proved above. 


References for this chapter 


[1] W. Red Alford, Andrew Granville, and Carl Pomerance, There are infinitely many Carmichael 
numbers, Ann. of Math. 139 (1994), 703-722. 


[2] D. R. Heath-Brown, Artin’s conjecture for primitive roots, Quart. J. Math. Oxford Ser. 37 (1986), 
27-38. 


[3] M. Ram Murty, Artin’s conjecture for primitive roots, Math. Intelligencer 10 (1988), 59-67. 


Additional exercises 


Exercise 7.10.1. Prove that we can write any polynomial f(x) mod p of degree < p—1 as 


p-1 


f(x) = SS F@O—(#@—a)?) (mod p). 
a=0 


Exercise 7.10.2.1 Prove that if f(x) € Z[x] is monic and has degree d and if prime p divides 
f(0), f(4),..., f(d), then p < d and p divides f(n) for all integers n. 


Exercise 7.10.3. We will find all powers of 2 and 3 that differ by 1, a special case of Catalan’s 
conjecture mentioned in section 
(a) What are the powers of 3 (mod 8)? What are the powers of 2 (mod 8)? 
(b) Show that if 2” — 3 =1 (mod 8) for some positive integers m,n, then n = 1 or 2. 
(c) Deduce that the only solutions to 2" — 3" = 1 are4—3=2-1=1. 
(d) 
) 


d) Prove that if 3’" — 2” = 1 with m odd, then m=n=1. 
(e) Prove that if 3?* —2” = 1, then both 3* — 1 and 3*+1 are powers of 2, and that this is only 
possible if k = 1. We deduce that the only solutions to 3” — 2” = 1 are3-—2=9-8=1. 


(This is the proof of Levi ben Gershon from around 1320.) 


Exercise 7.10.4.' Show that if (3) with n > 3 has no more than one prime factor which is > 3, 
then n = 3, 4, 5, 6, 8, 9, 10, or 18. (Use exercise [7.10.3]) 


Exercise 7.10.5. (a) Prove that if a > 1, then the order of a mod N := a4 — 1 is exactly gq. 
Now let q be a prime. 
(b) Deduce that if prime p divides a? — 1 but not a— 1, then p is a prime = 1 (mod q). 
(c) Prove that (fh a 1) =(q,a-1). 


(d)t Prove that there are infinitely many primes = 1 (mod q). 


Exercise 7.10.6. Let p be an odd prime, and let x, y, and z be pairwise coprime, positive 
integers. 
zP—yP 
zy 
(b) Show that if x? + y? = z?, then there exists an integer r for which z— y=r? or z—y= 
pe typ, 
(This problem continues on from exercise [3.9.7]) 


(a)? Prove that if p divides z — y, then =p (mod p?). 


Exercise 7.10.7. Deduce Theorem[7.6]from (7.5.1) using the Mébius inversion formula (Theorem 
4.4). 


Exercise 7.10.8. Let p bea prime. Prove that every quadratic non-residue (mod p) is a primitive 
root if and only if p is a Fermat prime. 


Exercise 7.10.9. Suppose that g is a primitive root modulo odd prime p. Prove that —g is also 
a primitive root mod p if and only if p= 1 (mod 4). 


250 7. Power residues 


Exercise 7.10.10. (a) Show that the number of primes up to N equals, exactly, 
3 n { (n—1)! \ 2 

2<n&N n-1 n 3 

(Here {t} is the fractional part of t, defined as in exercise [I.7.4{b).) 


(b) Suppose that n > 1. Show that n and n+ 2 are both odd primes if and only if n(n + 2) 
divides 4((n — 1)!+1) +n. 


Exercise 7.10.11. Prove that if f(x) € Z[a] has degree < p — 2, then sae f(a) =0 (mod p). 


Exercise 7.10.12.' Let p be an odd prime and k be an odd integer which is # 1 (mod p — 1). 
Prove that 1* + 2* +---+(p—1)* =0 (mod p?). 


Exercise 7.10.13.' Let Qn+1 = 2an + 1 for all n > 0. Can we choose ag so that this sequence 
consists entirely of primes? 


We define n to be a base-b pseudoprime if n is composite and b"~! = 1 (mod n). 


Exercise 7.10.14. Show that if n is not prime, then it a base-b pseudoprime if and only if 
ord, (b) divides n — 1 for every prime power p* dividing n. 


Exercise 7.10.15. Suppose that n is a squarefree, composite integer. 
(a) Show that #{a (mod p) : a"—-! =1 (mod p)} = (p—1,n—1). 
(b) Show that there are [],),,(p — 1,n — 1) reduced residue classes b (mod n) for which n is a 
base-b pseudoprime. 


pin 


Exercise 7.10.16. (a) Prove that if n is composite, then {b (mod n) : n is a base-b pseudo- 
prime} is a subgroup of the reduced residues mod n. 
(b)* Prove that if n is not a Carmichael number, then it is not a base-b pseudoprime for at least 
half of the reduced residues b (mod n). 
(c)t Suppose that p and 2p — 1 are both prime and let n = p(2p — 1). Prove that 


1 
#{b (mod n): n is a base-b pseudoprime} = 3 (n). 


Exercise 7.10.17. (a) Show that if p is prime, then the Mersenne number 2? — 1 is either a 
prime or a base-2 pseudoprime. 
(b) Show that every Fermat number 22” +1 is either a prime or a base-2 pseudoprime. 
(c) Show that p? divides 2?—! — 1 if and only if p? is a base-2 pseudoprime. 


None of these criteria guarantee that there are infinitely many base-2 pseudo- 
primes. However this is provable: 


Exercise 7.10.18.‘ Prove that there are infinitely many base-2 pseudoprimes by proving and 
developing one of the following two observations: 


e Start with 341, and show that if n is a base-2 pseudoprime, then so is N := 2” — 1. 
e Prove that if p > 3 is prime, then (4? — 1)/3 is a base-2 pseudoprime. 


Can you generalize either of these proofs to other bases? 


Exercise 7.10.19. Let a,b,c be pairwise coprime positive integers. Prove that there exists a 
(unique) residue class mo (mod abc) such that if m = mo (mod abc) and if am +1, bm +1, and 
cm-+1 are all primes, then their product is a Carmichael number (for example, a = 1,b = 2,c=3 
in exercise with mo = 0). 


Exercise 7.10.20. Let D be a finite set of at least two distinct positive integers, the elements of 
which sum to n. Suppose that d divides n for every d € D. Prove that if there exists an integer 
m for which pa := dnm + 1 is prime for every d € D, then []g¢p pa is a Carmichael number. (In 
particular note the case in which n is perfect and D is the set of proper divisors of n. The perfect 
number 6, for example, gives rise to the triple 6m + 1,12m + 1,18m+ 1, which we explored in 
exercise [7.6.4]) 


Questions on power congruences 251 


Exercise 7.10.21. (a) Prove that .010010000100... is irrational. (Here we put a “1” two digits 
after the decimal point, then 3 digits later, then 5 digits later, etc., with all the other digits 
being 0, the spacings between the “1”’s being p — 1 for each consecutive prime p.) 

(b)? Develop this idea to find a large class of irrationals. 


Appendix 7A. Card shuffling 
and Fermat’s Little Theorem 


In this appendix we will define order in terms of card shuffling, give a combinatorial 
proof of Fermat’s Little Theorem, and discuss quick calculations of powers mod n. 


7.11. Card shuffling and orders modulo n 


The cards in a 52-card deck can be arranged in 52! ~ 8 x 10°” different orders. Be- 
tween card games we shuffle the cards to make the order of the cards unpredictable. 
But what if someone can shuffle “perfectly”? How unpredictable will the order of 
the cards then be? Let’s analyze this by carefully figuring out what happens in a 
“perfect shuffle”. There are several ways of shuffling cards, the most common being 
the riffle shuffle. In a riffle shuffle one splits the deck in two, places the two halves 
in either hand, and then drops the cards, using one’s thumbs, in order to more or 
less interlace the cards from the two decks. 


One begins with a deck of 52 cards and, to facilitate our discussion, we will 
call the top card, card 1, the next card down, card 2, etc. If one performs a perfect 
riffle shuffle, one cuts the cards into two 26 card halves, one half with the cards 
1 through 26, the other half with the cards 27 through 52. An “out-shuffle” then 
interlaces the two halves so that the new order of the cards becomes (from the top) 
cards 


1,27, 2, 28, 3,29, 4, 30,.... 


That is, cards 1,2,3,...,26 go to positions 1,3,5,...,51, and cards 27, 28,...,52 
go to positions 2,4,...,52, respectively. We can give formulas for each half: 


1, P2k-1 for <k< 26, 
Q2k—52 for 27<k<52. 


252 


7.11. Card shuffling and orders modulo n 253 


These coalesce into one formula & + 2k — 1 (mod 51) for all k,1 < k < 52. The 
top and bottom cards do not move, that is, 1 — 1 and 52 — 52, so we focus on 
understanding the permutation of the other fifty cards: 


Any shuffle induces a permutation o on {1,..., 52} (9 For the out-shuffle, o(1) = 
1,0(52) = 52, and 


a(1+™m) is the least positive residue of 1+ 2m (mod 51) for 1 <m< 50. 


To determine what happens after two or more out-shuffles, we simply compute the 
function o*(.) (= a(a(...a(.)))). Evidently o*(1) = 1,0*(52) = 52, and then 
—_—Y 


k times 
o*(1+m) is the least positive residue of 1+ 2*m (mod 51) for 1 <_m < 50. 


Now 2° = 1 (mod 51), and so o8(1 +m) = 1+m (mod 51) for all m. Therefore 
eight perfect out-shuffles return the deck to its original state—so much for the 52! 
possible orderings! 


Eight more perfect out-shuffles will also return the deck to its original state, a 
total of 16 perfect out-shuffles, and also 24 or 32 or 40, etc. Indeed any multiple of 
8. So we see that the order of 2 (mod 51) is 8 and that 2" = 1 (mod 51) if and only 
if r is divisible by 8. This shows, we hope, why the notion of order is interesting 
and exhibits one of the key results (Lemma[7.1.2) about orders. 


Exercise 7.11.1.1 An “in-shuffle” is the riffle shuffle that interlaces the cards the other way; 
that is, after one shuffle, the order becomes cards 27,1, 28,2,29,...,52,26. Analyze this in an 
analogous way to the above, and determine how many “in-shuffles” it takes to get the cards back 
into their original order. 


Exercise 7.11.2.' What happens when one performs riffle shuffles on n-card decks, with n even? 


Exercise 7.11.3.4 Suppose that the dealer alternates between in-shuffles and out-shuffles. How 
many such pairs of shuffles does it take to get the deck of cards back into their original order? 


Persi Diaconis is a Stanford mathematics professor who left home at the age of 
fourteen to learn from sleight-of-hand legend Dai Vernon[ It is said that Diaconis 
can shuffle to obtain any permutation of a deck of playing cards. We are interested 
in the highest possible order of a shuffle. To analyze this question, remember that 
a shuffle can be reinterpreted as a permutation o on {1,...,n} (where n = 52 for a 
usual deck). One way to explicitly write down a permutation is to track the orbit 
of each number. For example, for the permutation o on 5 elements given by 


o(1) =4, o(2) =5, 0(8) =1, o(4) = 3, o(5) = 2, 
1 gets mapped to 4, which gets mapped to 3, and 3 gets mapped back to 1, whereas 
2 gets mapped to 5 and 5 gets mapped back to 2, so we can write 
o = (1,4,3)(2,5). 


Each of (1, 4,3) and (2,5) is a cycle, and cycles cannot be decomposed any further. 
Any permutation can be decomposed into cycles in a unique way, the analogy of 
the Fundamental Theorem of Arithmetic, for permutations. What is the order of 
a? Now o” = (1,4,3)"(2,5)”, so that o”(1) = 1, o”(4) = 4, and o”(3) = 3 if 


®That is, 0: {1,...,52} > {1,...,52} such that the o(i) are all distinct (and so o has an inverse). 
“Check out this story, and these larger-than-life characters, on Wikipedia. 


254 Appendix 7A. Card shuffling and Fermat’s Little Theorem 


and only if 3 divides n, while 0” (2) = 2 and o”(5) = 5 if and only if 2 divides n. 
Therefore o” is the identity if and only if 6 divides n, and so o has order 6. 


Exercise 7.11.4. Suppose that o is a permutation on {1,...,n} and that o = Ci ---C, where 
Ci,...,C, are disjoint cycles. 
(a) Show that the order of o equals the least common multiple of the lengths of the cycles 
Cj, 1<j<k. 
(b) Use this to find the order of the permutation corresponding to an out-shuffle. 
(c) Prove that if ni,...,m% are any set of positive integers for which nj +--- +n, =n, then 
there exists a permutation o = C1 ---C, on {1,...,n}, where each Cj has length nj. 
(d) Deduce that the maximum order, m(n), of a permutation o on {1,...,n} is given by 
max Icm[n1,...,%] over all integers n1,...,n% > 1 for which nj +--+ +n =n. 


Our goal is to determine m(52), the highest order of any shuffle that Diaconis 
can perform on a regular deck of 52 playing cards. However it is unclear how to 
determine m(n) systematically. Working through the possibilities for small n, using 
exercise [7.11.4] we find that 


m(5) = 6 obtained from 6 = 3-2 and 5 = 342, 
m(6) = 6 obtained from 6 = 3-2-1 and 6 = 34241, 
m(7) = 12 obtained from 12 = 4-3 and 7 = 4+3, 
m(8) = 12 obtained from 12 = 4-3-1 and 8 = 44341, 
m(9) = 20 obtained from 20 = 5-4 and 9 = 5+4, 
m(10) = 30 obtained from 30 = 5-3-2 and 10 = 54342, 
m(11) = 30 obtained from 30 = 6-5 and 11 = 6+5, 
m(12) = 60 obtained from 60 = 5-4-3 and 12 = 5+4+4+3. 


No obvious pattern jumps out (at least to the author) from this data, though one 
observes one technical issue: 


Exercise 7.11.5.' Show that there is a permutation o = C,---Cy on {1,...,n} of order m(n) 
in which the length of each cycle is either 1 or a power of a distinct prime. 


Exercise 7.11.6.' Use the previous exercise to determine m(52). 


Exercise 7.11.7. Use exercise to prove that logm(n) ~ /nlogn. 


7.12. The “necklace proof” of Fermat’s Little Theorem 


Little Sophie has a necklace-making kit, which comes with wires that each accom- 
modate p beads, and unlimited supplies of beads of a different colors. How many 
genuinely different necklaces can be Sophie make? Two necklaces are equivalent if 
they can be obtained from each other by a rotation; otherwise they are different; 
and so Sophie is asking for the number of equivalence classes of sequences of length 
p where each entry is selected from a possible colors. 


Suppose we have a necklace with the jth bead having color c(j) for 1 <j < p. 
One can rotate the necklace in p different ways: If we rotate the necklace k places 
for some k in the range 0 < k < p—1, then the jth bead will have color c(j +k) for 
1 <j <p, where c(.) is taken to be a function of period p. If two of these equivalent 
necklaces are identical, then c(j +k) = c(j+8£) for all 7, for some 0 <k << p-l. 
Then c(n + d) = c(n) for all n, where d = €—k € [1,p — 1], and so c(md) = c(0) 
for all m; that is, all of the beads in the necklace have the same color. 


7.13. Taking powers efficiently 255 


Therefore we have proved that, other than the a necklaces made of beads of the 
same color which each belong to an equivalence class of size 1, all other necklaces 
belong to equivalence classes of size p. Since there are a? possible sequences of 
length p with a possible colors for each entry, and a sequences that all have the 
same color, the total number of equivalence classes (different necklaces) is 
aP—a 

o 
In particular, we have established that p divides a? — a for all a, as desired} 


a+ 


Exercise 7.12.1. Let p be prime. Let X denote a finite set and f : X — X where f? =i, the 
identity map. (Here f? means composing f with itself p times.) Let Xgxeq := {@ € X : f(x) = z}. 
(a) Prove that |X| = |Xgxea| (mod p). 
Let G be a finite multiplicative group and X = {(a1,...,%p) € GP: #1 ---%p = 1}. 
(b)* Deduce that #{g € G: g has order p} = |G|P—! — 1 (mod p). 
(c) Deduce that if p divides the order of finite group G, then G contains an element of order p. 
Combined with Lagrange’s Theorem, Corollary [7.23.1] of appendix 7D, this is an “if and 
only if” criterion. 


Exercise 7.12.2. Let p be a given prime. 
(a) Use of appendix 4C to determine the number of irreducible polynomials mod p of 
prime degree q. 
(b) Deduce that g? = q (mod p) for every prime gq. 
(c) Deduce Fermat’s Little Theorem. 


More combinatorics and number theory 


[1] Melvin Hausner, Applications of a simple counting technique, Amer. Math. Monthly 90 (1983), 
127-129. 


7.13. Taking powers efficiently 


How can one raise a residue class mod m to the nth power “quickly”, when n is 
very large? In 1785 Legendre computed high powers mod p by fast exponentiation: 
To determine 5°° (mod 161), we write 65 in base 2, that is, 65 = 2° + 2!, so 
that 5% = 52°. 52’. Let fo = 5 and f, = f2 = 5? = 25 (mod 161). Next let 
fo = fo = f? = 257 142 (mod 161), and then fs fe = f3 142? = 39 
(mod 161). We continue computing f, = f? = f2_, (mod 161) by successive 
squaring: f, = 72, fs = 32, fe = 58 (mod 161) and so 5© = 5°*1 = f,- fo = 
58-5 = 129 (mod 161). We have determined the value of 5°° (mod 161) in seven 
multiplications mod 161, as opposed to 64 multiplications by the more obvious 
algorithm. 


In general to compute a” (mod m) quickly: Define 


fo =a and then f; = fea (mod m) for j = 1,2,...,91, 


where j; is the largest integer for which 2/1 < n. Writing n in binary, say as 
nm = 24202 +-..4 2% with j, > jo > +++ > je > 0, let gi = fj, and then 


5We’ve seen that Fermat’s Little Theorem arises in many different contexts. Even its earliest 
discoverers got there for different reasons: Fermat, Euler, and Lagrange were led to Fermat’s Little 
Theorem by the search for perfect numbers, whereas Gauss was led to it by studying the periods in the 
decimal expansion of fractions (as in section[7.8). It seems to be a universal truth, rather than simply 
an ad hoc discovery. 


256 Appendix 7A. Card shuffling and Fermat’s Little Theorem 


9 = Gi-1f;, mod m for i = 2,3,...,¢. Therefore 


R= Ti : Fiz . Fie = qr Pere = a” (mod m). 


This involves jg +€—1 < 27, < =e multiplications mod m as opposed to n 


multiplications mod m by the more obvious algorithm. 
One can often use fewer multiplications. For example, for 31 = 1+2+4+8+16 
the above uses 8 multiplications, but we can use just 7 multiplications if, instead, 


we determine a?! (mod m) by computing a? = a-a; a? = a?-a; a® = a-a®; a’ = 


a® .a®; a4 =a! . al; a3° = a*4-a® (mod m); and finally a?! = a®°- a (mod m). 
These exponents form an addition chain, a sequence of integers ej = 1 < 
€g < +++ < ex where, for all k > 1, we have eg = e; + e; for some i,j € 


{1,...,4 — 1}. In the example above, the binary digits of 31 led to the addi- 
tion chain 1, 2,3,4,7,8,15,16,31, but the addition chain 1, 2,3,6, 12,24, 30,31 is 
shorter. 

For most exponents n, there is an addition chain which is substantially shorter 
than je+/—1, though never less than half that size. There are many open questions 
about addition chains. The best known is Scholz’s conjecture that the shortest 
addition chain for 2” —1 has length < n—1 plus the length of the shortest addition 
chain for n. For much more on addition chains, see Knuth’s classic book [Knu98}. 


7.14. Running time: The desirability of polynomial time algorithms 


In this section we discuss how to measure how fast an algorithm is. The inputs into 
the algorithm in the previous section for calculating a” (mod m) are the integers 
a and m, with 1 < a < m, and the exponent n. We will suppose that m has d 
digits (so that d is proportional to logm). The usual algorithms for adding and 
subtracting integers with d digits take about 2d steps, whereas the usual algorithm 
for multiplication takes about d? steps)] 


Exercise 7.14.1. Justify that multiplying two residues mod m together and reducing mod m 
takes no more than 2d? steps. 


The algorithm described in the previous section involves about clog n multipli- 
cations of two residues mod m, for some constant c > 0, and so the total number 
of steps is proportional to 

(logm)? logn. 


Is this good? Given any mathematical problem, the cost (measured by the number 
of steps) of an algorithm to resolve the question must include the time taken to read 
the input data, which can be measured by the number of digits, D, in the input. 
In this case the input is the numbers a, m, and n, so that D is proportional to 
logm-+logn. Now if a and m are fixed and we allow n to grow, then the algorithm 
takes CD steps for some constant C' > 0, which is C times as long as it takes to 
read the input. You cannot hope to do much better than that. On the other hand, 
if m and n are roughly the same size, then the algorithm takes time proportional 


°Since we have to multiply each pair of digits together, one from each of the given numbers. 


7.14. Running time: The desirability of polynomial time algorithms 257 


to D°. We still regard this as fast—any algorithm whose speed is bounded by a 
polynomial in D is a polynomial time algorithm and is considered to be pretty fast. 


It is important to distinguish between a mathematical problem and an algo- 
rithm used for resolving it. There can be many choices of algorithm and one wants 
a fast one. However, we might only know a slowish algorithm which, even though 
it may seem clever, does not necessarily mean that there is no fast algorithm. 


Let P be the class of problems that can be resolved by an algorithm that runs 
in polynomial time. Few mathematical problems belong to P and the key question 
is whether we can identify which problems. We’ll discuss P in section [10.4] 


Exercise 7.14.2. Prove that the Euclidean algorithm works in polynomial time. 


Appendix 7B. Orders 
and primitive roots 


The problem of finding primitive roots is one of the deepest mysteries of num- 
bers. 
— from Opuscula Analytica 1, 152 (1783) by L. EULER 


It is easy to determine whether a given integer g is a primitive root mod p by 
using Corollary Moreover, given one primitive root one can find them all 
since the set of all ¢(p — 1) primitive roots is given by 

{gi (mod p):1< j <p—Jand (j,p— 1) = J}. 

The proportion of reduced residues that are primitive roots, opt) 
small. Therefore to find a primitive root we can select and test random elements 
mod p, and we should quickly be lucky. However Gauss described a search method 
that is more efficient than this, stemming from a different description of the prim- 
itive roots. 


, is rarely 


7.15. Constructing primitive roots modulo p 


Gauss’s efficient algorithm for constructing primitive roots mod p stems from a 
different description of primitive roots. 


Proposition 7.15.1. Suppose that p—1= UF q’. The set of primitive roots mod 
p is precisely the set 


II Aq: Aq has order q’? (mod p) 
a\p—1 


To prove this we need the following: 


Lemma 7.15.1 (Legendre). If ordm(a) = k and ordm(b) = £ where (k,¢) = 1, 
then ord,,(ab) = ké. 


258 


7.15. Constructing primitive roots modulo p 259 


Proof. Since (ab)** = (a*)*(b*)* = 1°1* = 1 (mod m), we see that ord (ab) |ké, 
so we may write ord,,(ab) = k,€,; where k,|k and ¢,|€ (by exercise [4.2.2). Now 


ah! = ghtpfyh — ((ab) R141 6/4 =1 (mod m), 


so that k|ki£ by Lemma [7.1.2] As (k,@) = 1 we deduce that k|k, and so ky = k. 
Analogously we have ¢; = @ and the result follows. 


Proof of Proposition Lemma [7.15.1] implies that each |] 
primitive root mod p. 


These products are all distinct for if Tie A; = ees B, (mod p), then, 
raising this to the power £ where ¢ = 0 (mod (p—1)/q’) and = 1 (mod q°), we see 
that each Ag = ([]q1)-1 4a) = (Ilgip-1 Ba)” = Bg (mod p) for each prime g|p— 1. 

Finally, by Theorem [7.6] we know that there are ¢(q’) such A,, and therefore 
a total of Teetp—1 ¢(q’) = (p — 1) different such products. That is, they give all 
of the ¢(p — 1) primitive roots. 


A, is a 
q\p-1 744 38 


Proposition |7.15.1}provides a satisfactory way to construct primitive roots pro- 
vided we can find the A, of order qQ’: 


Lemma 7.15.2. Suppose that a®-)/4 # 1 (mod p). If q® divides p—1, then 
Ay:= a-1)/2” (mod p) has order q® mod p. 


Proof. Now Ag = a?~! =1 (mod p) and Av = a’-)/4 41 (mod p). There- 


fore ord,(A) divides gq’ and not q’~1, and so ord,(A) = @7. 


GAUSS’S ALGORITHM to find primitive roots: For each prime power q?||p — 1: 
e Find an integer a, for which afP—V/4 #1 (mod p). 
b 
e Let Ay= alP ia (mod p). 


Then [] A, is a primitive root (mod p). 


q\p—1 


How do we find appropriate aj? We know that there are exactly 7 roots mod 
p of z—-)/4 = 1 (mod p), that is, a proportion + of the reduced residues mod p. 


For the remaining a (mod p) we have a!~))/4 4 1 (mod p) as desired, that is, for 
a proportion 1 — of the reduced residues mod p. 


One can try to find a, by trying 2,3,5,7,... until one finds an appropriate 
number, but there are no guarantees that this will succeed in a reasonable time 
period. However if we select values of a at random, then the probability that we 
fail to find an appropriate a, after k tries is 1/g*, which is negligible for k > 20. 


Finding elements of order n. For any integer n dividing p — 1 the residue 
h = g'?—)/" (mod p) provides a solution to 7” = 1 (mod p), where g is a primitive 
root, perhaps found by the method described above. Moreover ord,(h) = n. The 


set of solutions 


{u (mod p): u”=1 (mod p)}={h? (mod p):1<j <n}. 


260 Appendix 7B. Orders and primitive roots 


An alternative way to find a residue h mod p of order n is to modify Gauss’s 
algorithm: One only needs to determine the ag for the primes q dividing n and 
then one can take 


h= II Av (mod p). 


7.16. Indices / Discrete Logarithms 


Suppose that g is a given primitive root (mod p). If b= g® (mod p), then e is the 
index, or discrete logarithm, of b in base g, denoted ind,(b). This value is only 
determined mod p — 1. It is a challenging open problem to determine the discrete 
logarithm of a given residue in a given base in a short amount of time. 
Exercise 7.16.1. (a) Show that ind,(ab) = indp(a) + indp(b) (mod p— 1). 

(b) Show that ind,(1) = 0 and ind,(—1) = (p — 1)/2, irrespective of the base used. 

(c) Show that ind,(a”) =n ind,(a) (mod p— 1). 


There are several parameters that go into the definition of index. The one that 
appears to be of some concern at first sight is the choice of primitive root to use as 
a base. The next result shows that there is little difference between the choice of 
one base and another. 


Exercise 7.16.2. Suppose that g and h are two primitive roots mod p, where h = gé (mod p). 


(1) Show that (€,p —1) =1. 
(2) Show that the index with respect to g is £ times the index with respect to h, mod p. 
(3) Prove that there exists an integer m for which g =h™ (mod p). 


We have described residues in terms of index and in terms of order. What is 
the link between the two? 


Proposition 7.16.1. For any reduced residue a mod p we have 
ord,(a) - (p — 1, ind,(a)) =p-—1. 


Proof. Let g be a primitive root with k = ind,(a) and let m = ord,(a). This 
means that m is the smallest integer for which g*” = a™ = 1 (mod p), that 
is, the smallest integer for which p — 1 divides km, by Lemma [7.1.2] Therefore 
(p — 1)/(p —1,k) divides m by Corollary B-2.1] and the smallest such integer m is 


(p—1)/(p— 1,k). 


Exercise 7.16.3. (a) Suppose that k divides p— 1. Show that a is a kth power mod p if and 
only if & divides ind,(a). 
(b) Show that if a has order m mod p, then {a* (mod p): 1<k<™m, (k,m) = 1} is the set 
of residues mod p of order m. 


7.17. Primitive roots modulo prime powers 


A primitive root mod m (whether m is prime or not) is a residue g (mod m) whose 
powers, 1, g, g?,..., g?™-1 (mod m), give all of the ¢(m) reduced residues mod 
m. We will show that there are primitive roots mod m if and only if m is a prime 
power other than 2" with k > 3. 


7.17. Primitive roots modulo prime powers 261 


Proposition 7.17.1. Given integer a coprime to odd prime p, let € = ord,(a). 
Therefore we can write a’ = 1+ p/m for some integer 7 > 1 where p{m. Then, 
for all k > j we have ord,x (a) = pe-If. 

(If p = 2, the same result holds for k > 2 if we now define € = ord4(a) =1 or 2.) 


The proof of Proposition 17.1] depends on the following “lifting lemma”. 


Lemma 7.17.1. Let p* be a prime power anda =b+p*m where pt bm andk > 1. 
(a) If a” =b" (mod p**!), then p divides r. 
(b) If p® > 2, then a? = b? + p**1M for some integer M that is not divisible by 
D. 


Proof. Using the binomial theorem we have 


r 


a” = (b+p*m)" = Me ()o"@hmy 


j=0 


= eb" mip” + é  diaiee ak ed (mod ge), 


since subsequent terms are divisible by p** which is divisible by p*+?. We deduce 
that a” = b” (mod p*+") if and only if p divides rb"~!m, and so p divides r, as 
pt bm. This gives (a). 

For (b) we let r = p. If k > 2 or p # 2, then (%)p** = 0 (mod p**?), so 
that a? = bP + b?-!mp**! (mod p**?), and the result follows with the integer 
M=0)?-!m=m (mod p). 


Proof of Proposition |7.17.1} We begin by proving that ap 76 =] + pkm,, for 
some integer m, that is not divisible by p, for all k > 7 by induction. It is true by 
hypothesis for k = 7. Then we use Lemma[Z.17.1(b), with a replaced by a"! and 
b = 1, to deduce the claim for k + 1. 

Now we prove that ord,« (a) = p*—-I£ for all k > j, by induction. 

Let r = ord,j(a). Then r divides £, and a” = 1 (mod p’) implies a” = 1 
(mod p) and so £ divides r. Therefore r = ordp; (a) = @. 


Il 


Now suppose that the result is true mod p* and let r = ord,.+1(a). Now a” = 1 
(mod p*+') implies that a” = 1 (mod p") and so p*~/é divides r, say r = Rp*-J¢. 
Now a?” ”! = 1+ p*m, #1 (mod p*t) and so R #1. Then Lemma[Z171la), 
with a replaced by a? "6 and b = 1, shows that R must be divisible by p, and 
therefore R= pasa? © =1+p*tlmgay. 


We define Carmichael’s \-function \(m) to be the maximal order of a reduced 
residue mod m, so that A(p) = p— 1 for all primes p, because there are primitive 
roots for all primes p. 


Corollary 7.17.1. If p® is an odd prime power, then there is a primitive root g 
(mod p*), so that X(p*) = o(p") and each ord, x(a) divides d(p*). Moreover if g is 


262 Appendix 7B. Orders and primitive roots 


a primitive root mod p*, then it is a primitive root mod p* for all k >1. We also 
have (2) =1, A(4) = A(8) = 2, and A(2") = $(2*-1) = ordg« (3) for all k > 3. 


Proof. Let a be as in the hypothesis of Proposition [717.1] If p is an odd prime, 
then £ divides p — 1 and so ord,«(a) = p*~J£ < p*~!(p — 1) = ¢(p*). To obtain 
equality we need ¢ = p—1, that is, a is a primitive root mod p, and j = 1, that is, 
p* does not divide a?~! — 1. 

There exists a primitive root g mod p by Corollary[Z.5.1) If g?~' 4 1 (mod p?), 
then let a = g. Otherwise let a = g + p, so that a?~! # g?~+ = 1 (mod p?) by 
Lemma [7Z.T71(a). Either way, = p—1 and a?~' # 1 (mod p’), so ord,«(a) = 
¢(p*), the maximum possible. We deduce that A(p*) = ¢(p*) and then for every 
reduced residue class b, we know that ord,«(b) divides ¢(p*) = A(p*). 


For p = 2, we have a? = 1 (mod 8) for all odd a, and 3? = 1+8. As in the 
above argument, we deduce that (2”) = ordgx(3) = ¢(2*7') for all k > 3. 


Corollary 7.17.2. If prime power p* = 2,4 or is odd, then 
(7.17.1) (Z/p*Z)* ~ Z/o(p*)Z. 
Otherwise p = 2 and k > 3 in which case 


(7.17.2) (Z/2*Z)* = Z/2*Z@Z/2Z. 


Proof. If p* = 2,4 or is odd, then Corollary[Z17.Istates that \(p") = ¢(p*), so all 
of the reduced residues must be a power of the primitive root. If p = 2 and k > 3, 
then we saw that 3 has order 2*~? in the proof of Corollary [717.1] Reducing mod 
8, we see that —1 cannot be a power of 3 mod 2*, so the residues 


{37(-1)?: O0<a <2" "-1and0<6<1} 


are distinct and therefore give all the reduced residues mod 2*. 


Exercise 7.17.1. Use Euler’s Theorem and Lemmaf[?.1.2]to prove that A(m) divides ¢(m). Prove 
that there is a primitive root mod m if and only if A(m) = ¢(m). 


Exercise 7.17.2.1 Suppose that p* is a prime power dividing n and let m = ord,(2). Prove that 
ord,,x (2) divides n — 1 if and only if m divides (p — 1,n — 1) and p® divides 2™ — 1. 


Exercise 7.17.3. Let rn = a” —b” and suppose that m = mp is the smallest positive integer for 
which prime p divides xm. In exercise [1.7.24] we proved that plz, if and and only if m|n. Now 
suppose that p* \lam, and m|n so that am|xn. Prove that the power of p that divides wn/am 
equals the power of p that divides n/m. (This also follows from exercise [7.33.2|c).) 


Exercise 7.17.4.1 Suppose that g” — p™ = 1 where p is prime, with m > 1 and n> 2. 
(a) Prove that if m= 1, then q = 2, n is prime, and p is a Mersenne prime. 
(b) Prove that qg—1= p* for some integer k > 0. 
For now assume that k > 1 (and use Lemma[7.17.1]throughout). 
(c) Prove that n is a power of p. 
(d) Prove that if p* > 2, then g? — 1 = p*t+1, which is impossible. 
(e) Deduce that p = 2 and q = 3, so that 9 — 8 = 1 by exercise[7.10.3 
Now we may assume that m > 2 and k = 0 so that q = 2. 


7.18. Orders modulo composites 263 


(f) Suppose that r divides m with m/r odd. Prove that p” + 1 = 2/ for some integer j > 1. 
(g) Deduce that m =r and therefore that m is a power of 2. 
(h) Deduce that p™ + 1 = 2 (mod 8) so that n = 1, which is impossible. 


Therefore the only solution to gq” — p™ = 1 with p prime and m,n > 2 is 32 — 23 = 1. 


Exercise 7.17.5.' Prove that (1+ 2)?” = (1+ aP)p"* (mod p") for all prime powers p”. 


7.18. Orders modulo composites 


We now show that if m has two or more distinct prime factors, then there is no 
primitive root mod m. Moreover we determine the structure of the multiplicative 
group of reduced residues mod m. We can understand orders of reduced residues 
modulo a composite number m by breaking m up into its prime power factors: 


Proposition 7.18.1. If (a,m) = 1, then ord,,(a) = lcemfordye(a) : p*||m] divides 
A(m) = lem|[A(p*) = p* ||]. 


Proof. By induction on the number of distinct prime factors of m. It holds when 
m is a prime power p®, by Corollary [7.17.1] We now assume that m = rs where r 
and s are coprime integers for which the result is proved. 

Let k = ord,,(a) so that a* = 1 (mod m). Therefore a* = 1 (mod r) and 
a* =1 (mod s), so that [ord,.(a), ord,(a)] divides k = ord;,(a) by Lemma[Z.1.2] 

On the other hand let h = [ord,(a),ord,(a)] so that a” = 1 (mod r) and 
a’ =1 (mod s). Therefore a’ = 1 (mod m) by the Chinese Remainder Theorem, 
and therefore ord,,(a) divides h = [ord,(a), ord,(a)] by Lemma [7.1.2] 

Combining the last two paragraphs yields that ord,,(a) = [ord,(a), ord;(a)]. 

By the induction hypothesis, ord,(a) divides A(r) and ord,(a) divides X(s), 
so that ord,,(a) = [ord,(a),ord,(a)] divides [A(r), A(s)]. Therefore A(m) divides 
[A(r), A(s)}. 

On the other hand select b (mod r) so that ord,.(b) = A(r) and select c (mod s) 
so that ord,(c) = A(s). Now we select a (mod m) so that a = b (mod r) anda=c 
(mod s), using the Chinese Remainder Theorem. Therefore we have ord,,(a) = 
[ord,.(b), ord,(c)] = [A(r), A(s)]. Combining this with the last paragraph we see that 
this must be the maximum possible order mod m; that is, \(m) = [A(r), A(s)]. 


Exercise 7.18.1.1 Prove that \(m) < ¢(m) if m is divisible by 4p or pq for odd primes p < gq. 


Corollary 7.18.1. There is a primitive root mod m if and only if m = 2 or 4 or 
p® or 2p* where p is an odd prime. 


Proof. If \(m) = ¢(m), then m is of the form 2 or p* or 2p" for some odd prime p 
and integer k > 1 by exercise[7.18.1] In Corollary [7.17.1] we saw that A(m) < ¢(m) 
if m = 2" with k > 3, and we showed that A(m) = ¢(m) if m = 2 or 4 or p* where 
p is an odd prime. Finally \(2p*) = lem[A(p*), A(2)] = A(p*) = @(2p*). 


Exercise 7.18.2. Determine (65520). 


Exercise 7.18.3. Prove that a*°) = 1 (mod m) for all integers a coprime to m. 


Exercise 7.18.4. Show that composite n is a Carmichael number if and only if (mn) divides 
n—l. 


264 Appendix 7B. Orders and primitive roots 


Exercise 7.18.5. Let Nm(n) = #{x (mod n) : 2” =1 (mod n)} for some given integer m > 2. 
(a) Prove that Nm(n) is a multiplicative function of n. 

(b)? Prove that 2” = 1 (mod n) if and only if 27 = 1 (mod n) where g = (m, A(n)). 

(c) Deduce that Nj,(n) = Ng(n) where g = (m, A(n)). 

(d) Use Theorem[Z.6]to determine Nm(p) for every prime p. 


Exercise 7.18.6. Prove that 2 is a primitive root mod 3” for all m > 1, and mod 5” for all 
n> 1, and mod 11” for all r > 1. 


Appendix 7C. Finding nth 
roots modulo prime powers 


7.19. nth roots modulo p 


Given n, a, and p with (a, p) = 1 we are interested in finding all of the solutions x 
(mod p) to 

(7.19.1) x” =a (mod p). 

The question is equivalent to one in which the exponent is a divisor of p— 1. 


Theorem 7.9. Suppose that (a,p) = 1 and let g = (n,p—1). The solutions 


x (mod p) to x” = a (mod p) are in 1-to-1 correspondence with the solutions y 
(mod p) to y¥ =a (mod p). 


Proof. Given x let y= 2"/9 (mod p) so that y¥ = x” =a (mod p). 
On the other hand, suppose that g = un + v(p — 1) for some integers u and v. 
Given y let « = y“ (mod p), and so x” = (y“)"(y?-!)” = y¥ =a (mod p). 


We can therefore restrict our attention to the case that n divides p—1. We can 
provide easily verified conditions for there to be a solution to (719.1), and given 
one solution we can find them all quickly. 


Proposition 7.19.1. Suppose that (a,p) =1 and n divides p—1. 
(a) There are solutions x (mod p) of (719.1) if and only if a®-Y/" = 1 (mod p). 
(b) Given one solution xo, the set of all solutions mod p to (@19-I) is given by 
xo u (mod p) as u runs through the n roots of u” = 1 (mod p). 


Proof. (a) If there is a solution to (7.19.1), then a(—-)/" = (a”)@-D/" = gp-l=] 
(mod p). On the other hand if a®~))/" = 1 (mod p), then ord,(a) divides (p—1)/n 
and so n divides ind, (a) by Proposition[7.16.1] Writing ind,(a) = nk, we let « = g* 


nm — 


so that 2” = (g*)" = g*” =a (mod p). 


265 


266 Appendix 7C. Finding nth roots modulo prime powers 


(b) Given a solution x (mod p) to (7.19.1), let u = 2/ao so that u” = 2" /zjp = 
a/a = 1 (mod p). On the other hand if x = zou (mod p), then «” = 
a-1=a (mod p). 


In section [7.15] of appendix 7B we modified Gauss’s efficient algorithm to con- 
struct all of the nth roots of 1 mod p. Therefore to quickly determine all of the 
solutions to (7.19.1), we are left with the task of finding an initial solution zo. 


Exercise 7.19.1. Show that the solutions x (mod m) to <” = a (mod m) are in 1-to-1 corre- 
spondence with the solutions y (mod m) to yJ = a (mod m) where g = (n, A(m)). 


Exercise 7.19.2. Prove that if odd prime p does not divide a, then 


Oor4 ifp=1 (mod 4), 


twee tay = 
#{x (mod p): = (mod p)} ee ifp=3 (mod 4). 


7.20. Lifting solutions 


Gauss discovered that if an equation has solutions mod p, then one can often use 
those solutions to determine solutions to the same equation mod p*. 


Proposition 7.20.1. Suppose that p does not divide a and that u” = a (mod p). 
Ifp does not divide n, then, for each integer k > 2, there exists a unique congruence 
class b (mod p") such that b” =a (mod p*) and b= u (mod p). 


Proof. We prove this by induction on k > 2. We may assume that there exists 
a unique congruence class b (mod p*~1) such that b” = a (mod p*-') and b = 
u (mod p). Therefore if B? = a (mod p*) and B = u (mod p), then B” = a 
(mod p*—') and so B= b (mod p*—!). Writing B = b+ mp*—! we have 


B” = (b+ mp*-1)" = 6" + nmp*—1b"-1 (mod p*) 
which is = a (mod p*) if and only if 
a— b” u a—b" 


n= npe—1pn—1 Sa peo (mod p), 


as ub"-! = u” = a (mod p). Since p*—! divides a — b", and (an,p) = 1, the 
quantity on the right-hand side is a congruence class mod p and therefore gives m 
(mod p) uniquely. 


Exercise 7.20.1. Show that if prime p{an, then the number of solutions z (mod p*) to «” =a 
(mod p*) does not depend on k. 


We will focus on the case n = 2 in Proposition [8.8.1] Proposition [7.20.1] may 
be generalized to more-or-less arbitrary polynomial solutions, as we will see in 


Proposition [16.3.1 


We will also need the analogous result when p divides n, or at least when n = p. 
The difficulty lies in the fact that every residue class a (mod p) has a pth root mod 
p, namely a? = a (mod p) by Fermat’s Little Theorem, but only p residue classes 
a (mod p”) have a pth root. To see this, note that if a = b (mod p), then a? = b? 
(mod p*) by Lemma[7.17.1] 


7.21. Finding nth roots quickly 267 


Proposition 7.20.2. Let p be a prime, with kK = 2 if p is odd, and kK = 3 if p= 2. 


Suppose that p does not divide a and that u? = a (mod p"). For each integer 
k > k, there exists a unique congruence class b (mod p*~!) such that b” = a 
(mod p*) and b= u (mod p*~*). 


Proof. We prove this by induction on k > «. This is immediate for k = «. For 
k > «+1 we may assume that there exists a unique congruence class b (mod p*~?) 
such that b? = a (mod p*~!) and b = u (mod p*~!) by the induction hypothesis. 
Therefore if B? = a (mod p*) and B = u (mod p*~?), then B? = a (mod p*~*) 
and so B=b (mod p*~?). Writing B = b + mp*~? we have 

BP = (b+ mp*~?)? = bP + mp*—1b?-1 = bP + mp*-+ (mod p*) 
as b?-! = 1 (mod p) since (b,p) = 1. This is = a (mod p*) if and only if 

_ pp 
m= ° (mod p), 


pk-1 


which yields a unique residue class as p*—! divides a — b?. This implies that B = 
b+ mp*-? occupies a unique residue class mod p*~!. 


7.21. Finding nth roots quickly 


Suppose that there is a solution to x? =a (mod p). This implies that a(?-)/? = 1 
(mod p) by Proposition [7.19.1] We will try to find a solution to x? = a (mod p) 
by taking x = a* (mod p) for some integer k. This works if and only if a?* = a 
(mod p). This holds for any such a provided ®>+ divides 2k — 1, which can only 
hold if pt is odd, that is, p= 3 (mod 4), in which case we can take k = ptt (see 
exercise [8.2.4). We now rework this in a more general setting. 


Proposition 7.21.1. Suppose that n divides p—1 and that a is an nth power mod 
p. Assume that (n,2=") = 1 so that there exists an integer k for which nk = 1 


n 


(mod 2+). If x =a" (mod p), then x” =a (mod p). 


pal 


Proof. Since a is an nth power mod p we know that a= = 1 (mod p), and 
therefore 2” = a"* = a (mod p). 


When (n, 2+) > 1 we are unable to find a solution of 2” = a (mod p) but we 
can show that the problem is equivalent to one where a comes from a restricted 
set. For example, if n = 2 and p = 5 (mod 8), then (2,">+) = 1 so there exist 
integers & and odd m for which 2k + me = 1. Therefore (a*)? = ab™ (mod p) 
where b = a~*t (mod p). Now b? = a°= =1 (mod p) and so b = +1 (mod p). 
If a°t =b=1 (mod p), then (a*)? =a (mod p). If a*t =b=—1 (mod p) and 
y? = —1 (mod p), then (ya")? = (—1)a(—1)™ = a (mod p). So we have reduced 
the difficulty to finding the square root of —1 (mod p), a seemingly easier question 
than the original. Gauss generalized this argument as follows: 


Proposition 7.21.2. Suppose that n divides p—1 and that a is an nth power mod 
p. Let N be the smallest positive integer such that (n, pot) = 1 (so that n divides 
N, and N has the same prime factors as n). There exists a solution b (mod p) 


268 Appendix 7C. Finding nth roots modulo prime powers 


to bN/" = 1 (mod p) such that the solutions « (mod p) to x” =a (mod p) are in 
1-to-1 correspondence with the solutions y (mod p) to y" = b (mod p). 


Proof. Let b = a°* (mod p) so that bN/” = a’= = 1 (modp). If a” =a 
(mod p), let y=a°*x (mod p) so that y” = (a")*s =a*s =b (mod p). 


Now suppose that y” = b (mod p). Select integers m and k for which kn + 
mE = 1 and let x = y™a* (mod p). Therefore ak” = glam = a(a’x-)-™ — 
ab~™ (mod p) and so 

= (y™a*)\r = (y)™ . qk® =p" .qab™=a (mod p). 


Exercise 7.21.1. Show that N is the largest divisor of p— 1 with exactly the same prime factors 
as n. 


Example. To solve x? = a (mod 29) where a! = 1 (mod 29), we have n = 2 and 
N =4, and we take b= a" (mod 29), so that b? =1 (mod 29) and therefore b = 1 
or —1 (mod 29). By Proposition [7.21.2] we need to solve y? = b (mod 29). If b=1 
(mod 29), then y = 1 or —1 (mod 29). If b = —1 (mod 29), then y = —12 or 12 


(mod 29) (which requires some calculation). 


Now 2k+7m = 1 has the solution m = 1, k = —3, so that x = ya~? (mod 29). 
So if a? = 1 (mod 29), then 2 = +a* (mod 29); and if a’ = —1 (mod 29), then 
x =+12a* (mod 29) 


Example. Solve x? = 31 (mod 37). Here p—1 = 36, n = 3, N = 9, and 
3k + 4m = 1 so we can take m = 1,k = —1. Now b = 314+ = 1 (mod 37), and 
so we need to solve y? = 1 (mod 37). Now 37 divides 111 which divides 999 so 
that 10° = 1 (mod 37) and therefore y = 1,10, or 10? = —11 (mod 37). Therefore 
x =y-31—! =—3ly (mod 37) = 6, 23, or 8 (mod 37). 


ll Ih 


Exercise 7.21.2. Determine the square roots of 3 (mod 37) using the technique above. 


Appendix 7D. Orders 
for finite groups 


We will see in this appendix that the key results of this chapter, like Fermat’s Little 
Theorem and Wilson’s Theorem, can be formulated and proved for general finite 
groups. 


7.22. Cosets of general groups 


Suppose that G is a given finite group, not necessarily commutative (see section 
[0.11] of appendix OD for definitions). We explored the cosets of additive groups 
in appendix 2A, and now we look at the cosets in an arbitrary group. If H is a 
subgroup of G, then we define a left coset to be the set ax H = {axh:h © H} 
for any a € G. Right cosets are of the form H *a = {hxa:h€ H}, and these are 
indistinguishable if G is a commutative group. 


Proposition 7.22.1. Let H be a subgroup of G. The left cosets of H in G are 
disjoint. Moreover if G is finite, then they partition G, so that |H| divides |G]. 


This generalizes Theorem and Proposition and their proofs: 


Proof. We begin by showing that ax H and b*#H are either disjoint or identical. For 
if they are not disjoint, then there exist elements h,, hg € H such that axh; = bxho. 
Therefore b = b * hz * (hg)! = a* hy * (hg)~+ so that b = a* ho € a* H where 
ho := hi * (hg)~ € H since H is closed under the operation “«”. If g € b* H, 
then g = b* h for some h € H, and so g = (ax ho) *h = ax (ho *h) € ax A by 
associativity and the closure of H. Hence bx H C ax H. By an analogous proof we 
have ax H C b* H, and hence ax H = bx H. 

If G is finite, let a, * H,...,a, * H be a maximal set of disjoint cosets of H 
inside G. Their union must equal G or else there exists a € G which is in none of 
these cosets. But then the coset a* H is disjoint from these cosets (by the previous 


269 


270 Appendix 7D. Orders for finite groups 


paragraph), which contradicts maximality (since a * H,a1 * H,...,a,* H would be 
a larger set of disjoint cosets of H). 


Exercise 7.22.1. Show that every element of G belongs to a unique coset of H. 


One can prove the analogous result about right cosets, with the analogous 
proof. It is tempting to guess that a * H is equal to H «a but that is not true in 
general; see section of this appendix for more on that. 


7.23. Lagrange and Wilson 


Given the group operation *, we denote g * g by g?, and g*g *g by g?, etc. An 
element a has order m in G if m is the least positive integer for which a” = 1 (if 
such an integer exists). 


Exercise 7.23.1. Prove that if a has order m in G, then H := {1,a, CP a ,a'—1} is a subgroup 
of G. 


Theorem[/.1]was used to prove Fermat’s Little Theorem. Exercise[7.23-1]implies 
the following generalization of Fermat’s Little Theorem: 


Corollary 7.23.1 (Lagrange’s Theorem). For any element a of any finite group 
G we have al@l = 1. 


Proof. Let m be the order of a in G, that is, the least positive integer for which 
a” =1. Then H := {1,a,a”,...,a™~'} is a subgroup of G by exercise [7.23.1] We 
deduce that m = |H| divides |G| by Proposition [7.22.1] and so 


all = (q™yIGl/m = y1GI/m = 1, 


To deduce Euler’s Theorem let G = (Z/mZ)* so that |G| = ¢(m). 


Exercise 7.23.2. Let p be a prime which does not divide the order of the finite group G. 
(a) Prove that G contains no elements of order p. 
(b) Let X = {(@1,...,2)) € GP : x +++ xp = 1}, and then use exercise[7.12.1{a) to prove that 
|G|P-! =1 (mod p). 
(c) Deduce Fermat’s Little Theorem by applying (b) to the cyclic groups of order a for 1 <a< 
p—l. 


Theorem 7.10 (Wilson’s Theorem for finite abelian groups). The product of the 
elements of any given finite abelian group G equals 1 unless the group contains 
exactly one element, ¢, of order two, in which case the product equals ¢. 


Proof. We partition the elements of G of order > 2 into subsets of size two, each 
element with its inverse. The elements in each such subset are distinct for if y= x 
and zy = 1, then x? = 1; that is, x has order 1 or 2. Now the two elements of each 
of these subsets each multiply together to give 1, and so the product of all of the 
elements of order > 2 multiply together to give 1. Therefore we have proved that 


I[s= [| 2 where H = {g eG: g’ = 1}. 
geG heH 


We can see that H is a subgroup of G, for if a,b € H, then (ab)? = a(ba)b = 
a(ab)b = ab? = 1-1 =1. If A only has one element, 1, or two elements, 1 and ¢ 


7.24. Normal subgroups 271 


(the element of order two), then the result follows immediately. If H has at least 
two elements of order two, call them @ and m. Then L := {1, £,m, €m} is a subgroup 
of H, and the product of the elements of each coset xl of H is 


v-al-am- alm = 2'lm? = 1. 


Therefore the product over all of the elements of H, which is the union of the 
elements of all of the cosets of D, is 1. 


7.24. Normal subgroups 


A group G is cyclic if there exists an element g € G such that G = {g”: m= 
0,1,...}, and we say that G is generated by g. The additive group Z/mZ is cyclic; 
the elements of the group are precisely the multiples of the generator 1. The 
primitive roots of appendix 7B are precisely the generators of the multiplicative 
group (Z/mZ)*, when it is cyclic. 


Exercise 7.24.1. Prove that every finite cyclic group is isomorphic to some Z/nZ. 
Exercise 7.24.2. Prove that if |G| is a prime, then G is cyclic. 


Exercise 7.24.3. Show that the product of the elements in a finite cyclic group G is 1 if |G| is 
odd, and equals the (unique) element of G of order two if m is even. 


If H is a subgroup of G for which a * H = H «a for all a € G, then we call H 
a normal subgroup of G, denoted H « GE Normal subgroups are useful not only 
because the left and right cosets are the same, but also because one can then make 
sense of the quotient group G/H since then 


(ax H) x (bx H) = a * (Hx«b) * H =a * (bx H) x H = (axb) * H. 


There are two “trivial” normal subgroups {1} and G of any group G (which reminds 
one of the “trivial” factors of any integer n). If there are no other normal subgroups 
of G, other than {1} and the group itself, then we call G a simple group (which is 
analogous to the definition of a prime number). 


For any non-simple finite group G there is a maximal proper normal subgroup 
HT of G, and one can prove that the quotient group G/H is simple. But then either 
H is simple or it has a maximal proper normal subgroup L, say, such that H/L is 
simple. We can iterate this construction, and in the case of a finite group we have 


Hy, = {1} < Hp_-1 4 Ag_2 +++. IG = Ap. 
This process is finite since the |H,;| are strictly decreasing. The quotient groups 
Ho/ Hy, H./He,..., Hea) He 


are all simple groups. There may be more than one maximal proper normal sub- 
group, and so we might find ourselves with different possible sequences of quotient 
groups. However the Jordan-Hodlder Theorem states that the eventual list of finite 
simple groups is unique, other than the order in which they arise. This is anal- 
ogous to what happens when we factor large integers, and so the Jordan-Holder 
Theorem is a wonderful and surprising generalization of the Fundamental Theorem 


10The name “normal” is somewhat misleading since most subgroups of most groups are not normal! 


272 Appendix 7D. Orders for finite groups 


of Arithmetic. To properly understand this would take us too far afield of number 
theory. 


Exercise 7.24.4. Prove that if G is a finite, simple, abelian group, then G is isomorphic to the 
additive, cyclic group Z/pZ, where p is a prime. 


Exercise 7.24.5.' By taking G = Z/nZ deduce the Fundamental Theorem of Arithmetic from 
the Jordan-Hélder Theorem. 


Feit and Thompson showed that, otherwise, every non-cyclic finite simple group 
has even order. 


Appendix 7E. Constructing 
finite fields 


7.25. Classification of finite fields 


We have already explored the finite fields F,, = {0,1,...,p—1}, though in the guise 
of the additive and multiplicative groups of residues mod p. In this section we 
determine all the other possible finite fields. To do so we need two field properties: 


A field F has no zero divisors; that is, if ab = 0 with a,b € F, then either a = 0 
or b= 0. This holds since the non-zero elements of the field form a multiplicative 
group and so are closed under multiplication. Moreover if F is a finite field, then 
|F|-1 = 0 by applying Lagrange’s Theorem (Corollary [7.23.1) to the additive group 
of F. These properties makes it surprisingly easy to determine all finite fields. 


Exercise 7.25.1. Let F be a finite field. 
(a) Show that if prime gq divides |F|, then either qg- 1 = 0 or |F|/qg: 1 = 0. Use an induction 
hypothesis to deduce that there exists a prime p such that p- 1 =0 in F. 
(b) Show that this prime p is unique. 
(c) Begin with a non-zero element a; € F. If a2 ¢ {nia1 : mi € Fp}, then show that 
{niai + nz2a2: ni,n2 € Fp} has p? distinct elements. 
(d) Deduce by induction that there exist a1,...,ar € F for some integer r > 1 such that 


F= {nyja1 + ngag +---+Nrarp: 11,...,Nr € Fp}, 


the elements ni1a1 + n2a2 +---+n,ra,r being all distinct, and so F has p” elements. 


We have shown that all finite fields must have p” elements for some prime power 
p’. We know that there are fields of size p, the rings Z/pZ often denoted F,, and 
we ask whether there are fields of size p" for every r > 2? 


The easiest way to construct a finite field of p” elements is to use a root a of 
a polynomial f(x) of degree r which is irreducible in F,[z]. (We showed that such 
polynomials exist in section [4.12] of appendix 4C; it is much more challenging to 
actually determine an example of such a polynomial.) Then we can represent the 


273 


274 Appendix 7E. Constructing finite fields 


elements of the finite field as 
dj taja+---+a,_1a" 1 where we take the a; € F,. 


Exercise 7.25.2. Verify that this indeed gives a field with p” distinct elements. 


We will write q = p” for convenience. The given proof of Proposition [7.4.1 
suitably modified, works in any finite field, so we may state the following: 


Proposition 7.25.1 (Lagrange). Let q be a prime power and F be a field with q 
elements. Any non-zero polynomial f(x) € Fla] of degree d has no more than d 
roots in F. 


The multiplicative group of F has q — 1 elements so that af~! = 1 for all a € 
F* by Lagrange’s Theorem, Corollary [7.23.1] Lagrange’s Proposition, Proposition 
7.25.1] therefore implies that these are all of the roots of the polynomial 2?~! — 1 
and so 


(7.25.1) a t—1=][(«-a). 
acF 
a0 


Therefore the finite field of g elements is unique (up to isomorphism)! We denote 

it by Fy. 

Exercise 7.25.3. (a) Prove that if d|p" — 1 for some integer r > 1, then there are precisely 
¢(d) elements in Fyr of order d. 


(b) Deduce that Fj is a cyclic group (and therefore has a generator/primitive root) for any 
prime power q. 


Exercise 7.25.4. Show that the finite field of p? elements is not isomorphic to the integers mod 
p? (that is, Fi2F Z/p?Z). 


Let N;,, be the set of irreducible, monic polynomials in F,[x] of degree m, not 
including the polynomial x, and let Nin = |Nin|. Above we showed that if f(x) € N,- 
and f(a) = 0, then we can construct the unique field with g = p” elements, F,, by 
taking all polynomials in a with coefficients in F,; in other words F, = F,[a]. This 
implies that a € Fy, so that a4~' — 1 = 0, and therefore a is a root of x?~' — 1. 
We deduce that f(x) is a factor of «?—! — 1, and as the polynomials of NV, cannot 
share any roots (as they are irreducible), we deduce that their product 


®,.(x) = II f(z) divides «?’~! — 1, 
f(@)ENr 


Now if d divides r, then p? — 1 divides p” — 1, and so x?°~1 — 1 divides x?”~! — 1. 
We deduce that ®4(x) divides x?’~! — 1. Since the elements of the different Vy 
must be different, we deduce that 


Il ®,4(x) divides x? ~! — 1. 
dlr 


The degree of ®,,(x) is mN,, by definition, and so 


the degree of Il ®q(x) equals S- dNa=p"—-1 
dlr d|r 


7.25. Classification of finite fields 275 


by (4£.12.2)|""| This is the same as the degree of x?’ ~!—1, and so the two polynomials 
are equal, as one divides the other, they both have the same degree, and they are 
both monic. Moreover if we multiply through by z, then we obtain that 


(7.25.2) a? = II f(a). 


f(z) €Fp[z] 
f(x) monic, irreducible, 
of degree dividing r 


We complete this survey of finite fields by showing that if f(a) € F,[2] is an 
irreducible polynomial of degree r, with root a € F, (where g = p"), the 


f(x) = (e — a)(« — a”)(a — a") ---(@— a”). 


Proof. Write f(x) = bo) + bia +---+06,2" € F,[z] so that 
bo + bra + +++ +b,-a" = f(a) = 0. 


Taking the pth power of both sides and using the Child’s binomial theorem, we 
obtain 


bh + boa? +--+ + bPaP” = (bo + bia +---+b-a")? = 0? =0 in F,[al. 
But each vA = b; by Fermat’s Little Theorem, and so 
f(a?) = bp + bia? +--+ +,(a?)” = 0. 


One can repeat this to show that a?’ is also a root of f(x), and then a?» oe QP 
Since a € Fy, we know that a?” =at =a. We claim that 


r—-1 


3 r—1 


P P 


2 
a, a’, a? , a? ,...,a 


are the distinct roots of f, for if a?’ = a?’ with 0 <i<j<pr-1, then, by 
taking the p’~'th power of both sides, we obtain aP* = a where d = j —iso that 
1 <d<r-1. This implies that f(a) divides 2?° —x which contradicts the formula 
in with r replaced by d, as f(x) is irreducible of degree r. The result then 
follows from Proposition [7.25.1] 


Exercise 7.25.5. Fix prime p. Suppose that the sequence (un),>1 of integers satisfies the linear 
recurrence Un4¢ = Ad—1Un+d—1 +:*:+@0Un (mod p), where d is minimal. 
(a) Suppose that unip = bp_-1Unyp-1+-::+boun (mod p). Prove that either the un = 0 
(mod p) for all n or f(x) := ¢—ag_ya4~1—---—aia—ao divides x? —bp_ja?-1—---—bo 
mod p. 
(b) Prove that if f(x) (mod p) is irreducible and uj # 0 (mod p) for some j, 0 < j < d—1, 
then (un)n>1 is periodic mod p with period pt —1. 


This implies that the upper bound in exercise|2.5.21)is best possible. 


Further reading on arithmetic associated with finite fields 


[1] Jean-Pierre Serre, On a theorem of Jordan, Bull. Amer. Math. Soc. 40 (2003), 429-440. 
[2] Neal Koblitz, Why study equations over finite fields?, Math. Mag. 55 (1982), 144-149. 


'l The value of N; here is one less than in section [4.12] of appendix 4C, since here we exclude the 
polynomial z from our count. Otherwise the Ng are all the same. 

12The map a — a? of roots is usually credited to Frobenius from the end of the 19th century. 
However this was discussed by Gauss, almost a hundred years earlier, in one of the chapters of his 
Disquisitiones that was discarded by the printer and not published until after Gauss’s death. 


276 Appendix 7E. Constructing finite fields 


7.26. The product of linear forms in F, 


Any line in the (a,y)-plane in F, that goes through the origin takes the form 
az + by = 0 for some a,b € Fg, not both 0. These lines are not all distinct since, for 
example, 2x + 6y = 0 gives the same line as x + 3y = 0, so we need to take account 
of scalars, by noting that distinct lines have different a : b ratios. If a 4 0, then any 
such ratio is proportional to a ratio of the form 1: r for some r € F, (and these are 
all different); if a = 0, then any such ratio is proportional to 0: 1. Therefore there 
are q+ 1 distinct such lines. The product of these, up to a scalar, is therefore 


]] @- ) -y= a(t — yy = oly — y*a, 
ack, 
which we prove by replacing x with w/y in (725.1) and multiplying through by 
ae 
The set of ratios a : b is really a 1-dimensional set, which we call projective 
space, denoted P!(F,). Therefore we can rewrite the above as 


(7.26.1) II (ax + by) is proportional to xy? — yx’. 
(a,b) EP? (Fa) 


The right-hand side is a polynomial which equals the determinant of the matrix 


ey 
zt yt)" 


We can determine the factorization of this determinant directly because for any 
a,b € F,, we have 


Gc sy a O\)  f ax~+by y)\ | ax+by — y 

é y) (; ') 7 & + by! i) — & + by)? 4) 
since (ax + by)? = (ax)4 + (by)? = ax? + by? by the Child’s binomial theorem (as q 
is a power of p) and then Fermat’s Little Theorem. Therefore ax + by divides each 
term in the first column on this matrix, and so ax + by divides the determinant for 
each such pair a,b. Now each of the factors in the product of the left-hand side of 
(7.26.1) divides the right-hand side and so their product does as they are coprime 
in Z[a, WJ By counting degrees we see that the two sides must therefore be equal, 
up to a scalar multiple. 


This latter argument generalizes surprisingly easily. The matrix 


looks a bit like the Vandermonde matrix though now the changing powers are in the 
second exponent, not the first. (We could write the top row as x, y? , z?.) As 
above we can alter the first column by multiplying through by an arbitrary column 


131¢ is more conventional to consider the polynomials with a # 0 to belong to F,(y)[«] and to 
establish their coprimality using field properties. 


7.26. The product of linear forms in Fy 277 


matrix (while leaving the other two columns alone) 


x y z a ax + by + cz ax + by + cz 
xt yf? a b] = ax’ + by? +cz4 |= (az + by + cz)4 
xt yt 2 c axt + by? + cz4 (ax + by + cz)4 


and so ax + by + cz is a factor of the determinant. Therefore the product of all 
such (distinct) factors divides the determinant. In this case we need 2-dimensional 
projective space, P?(F,), the set of (a,b,c) that are distinct up to a scalar multiple. 
By asimilar analysis one can show that this set contains p?-+p+1 elements, which 
is the degree of the determinant of our matrix (this can be seen by considering the 
contribution to the degree from each row, given the evident fact that there is no 
cancellation when we expand the usual 3! terms of the determinant). Therefore the 
determinant of our matrix is equal in F, to a scalar multiple times 


II (ax + by + cz). 
(a,b,c) EP? (Fq) 


Exercise 7.26.1.1 Generalize this to the appropriate n-by-n determinant in Fy, with proof. 


Appendix 7F. Sophie Germain 
and Fermat’s Last Theorem 


Sophie Germain, born in 1776 to a wealthy Parisian family, studied and researched mathe- 
matics under the pseudonym Antoine LeBlanc as women were not accepted in universities 
at that time She wrote to and worked with Lagrange and Legendre on the leading 
questions of the day, though she was cautious about revealing her true identity. Read- 
ing Disquisitiones in 1804, she wrote to Gauss, sharing her ideas and became a regular 
correspondent. 


In 1807 Napoleon’s troops occupied much of Prussia where Gauss lived. Germain 
was concerned that Gauss might suffer the fate of Archimedes|["} Being wealthy and well- 
connected, she asked General Pernety, a family friend, to ensure Gauss’s safety. Gauss 
appreciated but did not understand such concern for his well-being from a woman he did 
not know! Three months later, Germain disclosed her true identity to Gauss. He replied: 


But how to describe to you my admiration and astonishment at seeing my esteemed correspondent, 
Monsieur LeBlanc, metamorphose himself into this illustrious personage who gives such a brilliant 
example of what I would find it difficult to believe. A taste for the abstract sciences in general and 
above all the mysteries of numbers is excessively rare: one is not astonished at it: the enchanting 
charms of this sublime science reveal only to those who have the courage to go deeply into it. 

But when a person of the sex which, according to our customs and prejudices, must encounter 
infinitely more difficulties than men to familiarize herself with these thorny researches, succeeds 
nevertheless in surmounting these obstacles and penetrating the most obscure parts of them, 

then without doubt she must have the noblest courage, quite extraordinary talents and superior 
genius. Indeed nothing could prove to me in so flattering and less equivocal manner that the 
attractions of this science, which has enriched my life with so many joys, are not chimerical, 

than the predilection with which you have honored it. 


— excerpt from a letter from C. F. GAuss to SOPHIE GERMAIN (1807) 


The Ecole Polytechnique, which opened when Germain was 18, did not accept women, but she 
was able to study by correspondence. 

15 Archimedes died in 212 B.C. when Romans captured his home city of Syracuse. Archimedes 
had just drawn a mathematical diagram in the sand when a Roman soldier asked him to get up to 
be arrested. Archimedes ignored him, wishing to finish what he was doing. The Roman soldier lost 
patience and ran Archimedes through with his sword. 


278 


7.27. Fermat’s Last Theorem and Sophie Germain 279 


7.27. Fermat’s Last Theorem and Sophie Germain 


The first strong result for general exponents in Fermat’s Last Theorem was: 


Theorem 7.11 (Sophie Germain’s Theorem). Suppose that p is an odd prime and 
q = 2p+ 1 is also prime. If x, y, and z are integers for which x? + yP + z? = 0, 
then p divides x, y, or z. 


We will need the following simple lemma. 
Lemma 7.27.1. Suppose that p is an odd prime for which q = 2p+1 is also prime. 


If a,b,c are coprime integers with a? + b? + cP =0 (mod gq), then q divides abc. 


Proof. If n is not divisible by gq, then n? = fa eed (mod q). Therefore 
if q does not divide abc, then a?, b?,c? = —1 or 1 (mod q), and so a? + b? + CP = 
—3,—1,1, or 3 (mod q). This is impossible as g = 2p +1 > 3. 


Proof of Sophie Germain’s Theorem. Assume that there is a solution in which 
p does not divide x, y, z. In the proof of Proposition [6.4.1] we saw that we may as- 
sume 2,y,2 are pairwise coprime so, by Lemma exactly one of x,y,z is 
divisible by q: Let us suppose that g divides x, since we may rearrange 2, y, z. 


By exercise [7.10.6{b) there exist integers a, b,c,d such that 


y? + 2P 
zty=a’, z+a=0?, c+y=C, and ae = d? where x = —ad, 

as p{ xyz. Now a? = z+y =(z+2)4+(xa4+y) =)? +c? (mod g) as g|z, and so we 

see that q divides at least one of a, b, c by the lemma. However q does not divide b 

since (q, b)|(x,z+2) = (a, z) = 1 as q|x and b|z +2; and similarly q does not divide 

c. Hence q divides a; that is, —z = y (mod q). But then 


y? + 2P ae eae p7* ft oP = 
dP = ee (-2)P ty! = oy ty! = py? (mod gq). 
j=0 j=0 


Therefore, as y= x+y =c? (mod q) and gq — 1 = 2p, we deduce that 
A = 4d’? = (2d)? = (Qpy?—+)? = (-1)?(c?)”"' =1 (mod q), 


which is impossible as g > 3. Therefore p divides zx, y, or z. 


The first case of Fermat’s Last Theorem is the claim that there do not exist 
integers x, y, z not divisible by p for which «? + y? + z? = 0. Sophie Germain’s 
Theorem implies that if there are infinitely many pairs of primes of the form p,q = 
2p +1, then there are infinitely many primes p for which the first case of Fermat’s 
Last Theorem is true. However it is still an open question as to whether there are 
infinitely many Sophie Germain prime pairs, that is, those of the form p,q = 2p+1. 


Subsequently Germain’s idea was developed to show that if m = 2 or 4 (mod 6), 
then there exists a constant N,, 4 0 such that if p and g = mp +1 are primes for 
which gq { Nj, then the first case of Fermat’s Last Theorem is true for exponent 
p. Adleman, Fouvry, and Heath-Brown used this to show that the first case of 
Fermat’s Last Theorem is true for infinitely many prime exponents (long before 
Wiles fully proved Fermat’s Last Theorem). 


Appendix 7G. Primes 
of the form 2” +k 


7.28. Covering sets of congruences 


Are there many primes of the form k-2" +1 or k+2” or 2” +k for given integer k? 
In this generality it seems like a more difficult question than asking about primes of 
the form 2” +1, but Erdés showed, ingeniously, how these questions can be resolved 
for certain integers k: 

Let F,, = 2?" +1 be the Fermat numbers; remember that Fo, F,, F2, F3, Fy are 
prime and Fs = 641 x 6700417. Let ko (mod PoP, Fo F3F4F5) be defined by ko =1 
(mod 641 FoF) F2F3F,) and kp = —1 (mod 6700417). For any positive integer k 
such that k = ko (mod Fo F\ FoF3F Fs) we have: 


e ifn =1 (mod 2), then k- 2”? +1=1-2'+1=F) =0 (mod Fh); 

e ifn =2 (mod 4), then k- 2” +1=1-2?+1=F, =0 (mod F}); 

e ifn =4 (mod 8), then k-2”+1=1-22 +1= Fy =0 (mod F)); 
( 


e ifn =8 (mod 16), then k-2"°+1=1-2% +1= Fy =0 (mod F); 

e if n= 16 (mod 32), then k- 2" +1=1-27 +1= F,=0 (mod Fj); 

e if n = 32 (mod 64), then k- 2° +1=1-2? +1= Fs =0 (mod 641); and 
e ifn =0 (mod 64), then k- 2” +1=—1-2°+1=0 (mod 6700417). 


Every integer n belongs to one of these arithmetic progressions (these are called a 
covering system of congruences), and so we have exhibited a prime factor of k-2"+1 
for every integer n. Therefore k - 2” + 1 is composite unless it equals that prime 
factor, which is impossible as each k - 2” + 1 is too large. We deduce that for a 
positive proportion of integers @ (that is, the positive integers 0 = ky (mod Fy —2)), 
there is no prime p for which (p — 1)/@ is a power of 2. 


280 


7.29. Covering systems for the Fibonacci numbers 281 


Exercise 7.28.1. Deduce that k- 2” + 1 is composite for every integer n > 0 (with k as defined 
above). 


Exercise 7.28.2. Prove that there exist infinitely many integers k for which 2" + k is composite 
for every integer n > 0. (That is, there is no prime p equal to k plus a power of 2.) 


Exercise 7.28.3. Let k be as above. Let x2» be a second-order linear recurrence sequence for 
which ty, = 3¢%y~—1 — 24@p~2 for all n > 2. Show that x, is composite for all n > 0 if (a) 79 =k+1 
and 71 = k+2or (b) ro =k+4+1 and 21 = 2k+1. 


Exercise 7.28.4. Let @ be any positive integer for which € = —k (mod F% — 2). Prove that 
£-2" — 1 and |2” — ¢| are composite for every integer n > 0. Deduce that a positive proportion 
of odd integers m cannot be written in the form p+ 2” with p prime. 


Exercise 7.28.5. Prove that 13-20" + 1 is not prime for any k > 1. 


John Selfridge showed that at least one of the primes 3,5, 7,13,19,37, and 73 
divides 78557 - 2” + 1 for every integer n > 0. This is the smallest k known for 
which &- 2” + 1 is always composite. It is an open problem as to whether this is 
the smallest such k. 


Exercise 7.28.6. Prove that if n > 3, then Fy, — 2 = FoF... Fn—1 cannot be written in the 
form p+ 2" + 26 where p is prime and k > £> 0. 


For n = 2 there are the two solutions Fy — 2 = 17-2 =5+2?+2=3+42342?, 
as well as Fy -2=3=2+1. 


We suspect that there are infinitely many primes in the sequence 2” — 1 with 
n > 1, the Mersenne primes. And we now have many examples of integers k for 
which every 2” +k is composite, those integers k for which we can create a covering 
system. So what about other integers k? For example, is 2” — 3 infinitely often 
prime? Calculations suggest there might be infinitely many primes of the form 
2” — 3, though we cannot compute very far since the numbers grow so rapidly. The 
most optimistic conjecture would be: Fix integer k. 


e Hither there is a finite set of primes P such that for every positive integer n, 
the number 2" + k is divisible by some element of P (and so is not prime). 


e Or there are infinitely many positive integers n for which 2” +k is prime. 


7.29. Covering systems for the Fibonacci numbers 


We will show that 727413 cannot be written as a prime plus or minus a Fibonacci 
number, by using covering systems to show the following. 


Theorem 7.12. If N is an integer = 93687 or 103377 (mod M), where M = 
312018 = 2-3-7-17-19- 23, then (N + Fy, M) > 1 for all integers k > 0. 


Here we let Fy, be the kth Fibonacci number. In the covering systems of section 
[7.28] the key idea is that if 2 has order m mod p, then the sequence 1, 2, 27, 2°,... 
(mod p) has period m. Therefore the zeros in the sequence 2” — 1 (mod p) occur 
every mth value of n, where m is the period of this sequence. However for the 
Fibonacci numbers the frequency of the 0 (mod p) is not necessarily the same 
as the period of the Fibonacci numbers mod p. For example, the period of the 


282 Appendix 7G. Primes of the form 2” + k 


Fibonacci numbers mod 3 is 
0, 1, 1, 2, 0, 2, 2, Le 


which has length 8 whereas 3 divides F,, whenever 4 divides n. Notice that 1 and 2 
both appear three times in the period, which is more often than 0 appears; we will 
exploit this, and similar repetitions, in our construction: 


A Fibonacci covering system 


If N = 93687 (mod M), then the divisibilities lead to a covering system but with 
several congruence classes to each modulus: 


2|N + Fy if and only if & = 1 or 2 (mod 3), 

3|N + Fy if and only if k = 0 (mod 4), 

7|N + F;, if and only if k = 1, 2, 6, or 15 (mod 16), 
17|N + F;, if and only if k =0 (mod 9), 

19|N + Fy, if and only if k = 3,8, or 15 (mod 18), 
23|N + F), if and only if k = 30 or 42 (mod 48). 


This proves our theorem for N = 93687 (mod M), and a similar analysis reveals 
the proof for N = 103377 (mod M). 


From our theorem we deduce that 727413 cannot be written as p— F with p 
prime, for if so, then (p, M) = (727413+ Fy, M) > 1, as 727413 = 103377 (mod M), 
and so p divides M. This is impossible or else 23 > p = 727413 + Fh > 23, a 
contradiction. We also note that 727413 cannot be written as p+ FF; with p prime, 
simply by showing, by computations, that 727413 — F; is composite, whenever 
Fy, < 727413. 


Calculations suggest that 6851 and 7293 might not have a representation as a 
prime plus or minus a Fibonacci number, but there seems little chance of finding 
a covering system for either. The smallest such integer known is 135225 thanks to 
Ismailescu and Shim. 


7.30. The theory of covering systems 


A set of congruence classes 
a, (mod my)),...,a~, (mod mz) 


is called a covering system if every integer belongs to one of the congruence classes. 
For example, one might take all the congruence classes mod m, and we have seen 
several examples above. For example 2/~1 (mod 2) for 1 < 7 < Jand0 (mod 27). 
With Selfridge’s construction one can use 


1 mod 2, 1 mod 3, 2 mod 4, 0 mod 9, 8 mod 12, 6 mod 18, and 12 mod 36. 


We have seen in this appendix some beautiful applications of covering systems 
to showing that there are no primes in certain sequences. One can also study 
covering systems for their own sake. The most precise is an exact covering system 
in which every integer belongs to exactly one of the congruence classes. If we take 


7.30. The theory of covering systems 283 


each a; to be the least residue mod m;, then our covering system is exact if and 
only if 


k . 
oe a 
Sv je—gm 
1=1 
Exercise 7.30.1. (a) Prove this. 
(b)t Deduce that if m1 < m2 <--- < mx in an exact covering system, then either mz = 1 or 


Mk = Me-1- 


A distinct covering system is one in which the m;, are distinct, so we may write 
my, < mz <-:+ < mg. In 2015, Bob Hough proved that m; < 10!°, a famous old 
question of Erdés (whether or not the minimum modulus could be unbounded) [!4 
This leaves the Erdés-Selfridge question as to whether there is a distinct covering 
system in which the moduli are all odd. Schinzel showed that, if so, then for any 
f(x) € Z[x] with f(0) 4 0 and f(1) € —1, the polynomials x” + f(x) are irreducible 
for all n in some arithmetic progression, an intriguing connection. 


Interesting articles on covering systems 
[1] Bob Hough, Solution of the minimum modulus problem for covering systems, Ann. of Math. (2) 
181 (2015), 361-382. 


[2] Dan Ismailescu and Peter C. Shim, On numbers that cannot be expressed as a plus-minus weighted 
sum of a Fibonacci number and a prime, Integers 14 (2014), Paper No. A65, 12 pp. 


[3] A. Schinzel, Reducibility of polynomials and covering systems of congruences, Acta Arith. 13 
(1967/1968), 91-101. 


16 This upper bound has recently been improved to m, < 616000 by Balister, Bollobas, Morris, 
Sahasrabudhe, and Tiba. 


Appendix 7H. Further 
congruences 


In this chapter we saw some extraordinary congruences mod p, like Fermat’s Little 
Theorem and Wilson’s Theorem. In this appendix we seek to develop some of these 
elegant congruences to higher powers of p. 


7.31. Fermat quotients 


Fermat’s Little Theorem tells us that p always divides a? — a, but what about 
divisibility by p?? Experiments indicate that this seems to happen rarely, but it 
does happen; for example 


p* divides 2? — 2 for p = 1093 and for p = 3511. 


These are the only two examples known, despite extensive searching (up to 7.2 x 
101°), and we have no idea whether there are any more examples. We define the 
Fermat quotient 


and note that g,(2) = 0 (mod p) if and only if p? divides 2? — 2. It turns out that 
we can find some interesting congruences for the Fermat quotients mod p. 


By the binomial theorem and exercise b) we have 


eae p-1 
(l—a)P—1+a? 1% Dp a 
£,(1 = = a d p). 
Pt a B e a Ue i m=1 oe a a 
For example, for x = —1 and then for x = 2, we obtain 
QP —2 ee (-1)™ ere 
= ey i=}) = (mod p). 
2 m=1 ie m=1 m 


284 


Binomial coefficients 285 


The first congruence implies that 


l<m<p-1 1l<m<p-1 
m od m even 
(7.31.1) = > aoe ys a i (mod 
— ~ m mm n P); 
l<m<p-1 l<m<p-1 n=1 


by adding >?) <m<p—1,m even 1/m to both sums, writing m = 2n in the second sum, 
and then by Corollary [7.5.2] with k = —1. 


By their definitions, we have that 
Lp(1— x) = eqp(x) + (1— @) qp(1 — 2). 


The Taylor expansion of £,(1— 2) is a truncation of the Taylor expansion for the 


logarithm function, 
m 


log(1 — x) = Se 


m>1 


We discuss an appropriate way to extend £,(1—<) in this direction in section [16.6] 
There is another connection since 


(7.31.2) Ip (a) + Gp(b) = gp(ab) (mod p). 
To see this, note that 


1 + pqp(ab) = (ab)?~* = aP~“*bP! = (1 + pap(a))(1 + pap(b)) 
= 1+ p(q(a) + Gp(b)) (mod p’), 


subtract 1, and divide by p. 


Binomial coefficients 


We have seen that (—1)/(?; ") =1 (mod p). What about mod p?? In that case 


oe (Ce 


If j = 2+, we can use (2311) and (731.2) to deduce that 


p-1 —1 _ 
(1° e ) = 14 2pqp(2)=1-+4p9(4) = 4-2 (mod p?). 
2 


In 1894 Morley took this one step further by proving that 


2 ae eee 
Sle CAN) iat cea 


and that this holds mod p* if and only if p divides the Bernoulli number By-3.- 


286 Appendix 7H. Further congruences 


Exercise 7.31.1. (a) Show that C) = Pian (1+ 8). 
(b) By expanding the product, deduce that for odd primes p, we have 


2 
-1 p-1 p-1 
1 /2p R11. op? 1 1 a 
=14 =} = — d p*). 
(arse iF +5 ((O 2) -O SE) (moar 


n=1 n 


(c) Use Corollary and exercise [7.10.12] to deduce that @) = 2 (mod p?) for all primes 
p> 3. 
(d)# Prove Wolstenholme’s Theorem that, for any prime p > 3 and any integers n > m > 0, 


cae) = Cy (mod p?). 


(The difference is divisible by p* if and only if p divides the Bernoulli number Bp-3-) 


Bernoulli numbers modulo p 


In section of appendix OB we introduced the Bernoulli numbers and showed 
that they are rational numbers; moreover By = —4 and B, = 0 if n is odd and 
> 1. It is important to know the prime factors of the denominators of the B2,: 


Theorem 7.13 (The Von Staudt-Clausen Theorem). Prime p divides the denom- 
inator of Ban, if and only if p—1 divides 2n. The denominator of Ban is always 
squarefree; in fact 

1 


(7.31.3) Bon + S- — is an integer for alln > 1. 
Pp 


p prime 
p—1|2n 


Proof. For each prime p, we will verify that pB2,, = 0 (mod p) if p—1{2n, and 
pBoy, = p—1 (mod p) if p — 1]2n, which implies the theorem. We will prove this 
by induction on n > 1. It is evidently true for n = 1 as By = -4, so now assume 
n > 2, and the result holds for all m <n—1. By Theorem [0.1] and with 
k = 2n +1 we have 


7.31.4 > j2r : B B — 1 ~ 2n + 1 B 2n+1—-r 
(7. a ares 2n+1(P) met) = eT Da ( . ) rP : 
Our goal is to evaluate both sides of this equation mod p. We begin with the 
right-hand side. The r = 1 term gives —p?”/2 which is = 0 (mod p). We are left 
with the terms with r = 2m, 0 < m <n (as B, = 0 when r is odd and > 1). 
For each such term, ata (75777) = a akan as each binomial coefficient is an 
integer, this is a rational number whose denominator divides both 2n + 1 and 2m 
and therefore divides (2n + 1,2m). Let p°||(2n + 1,2m) so that the denominator 
of = ( is not divisible by p. Now p® divides 2n + 1 — 2m and so if e > 2, 
then 2n —1—2m > p® —2 > 2°-2>e;ife=0or 1, then 2n-1-2m>1>e, 
ifm <n—1. By the induction hypothesis p* Bz, = 0 (mod p) for all m <n—1. 
Collecting all this information together gives that if 0 <m<n-—1, then 


1 In +1 " z pe In+1 pen—l-2m 
=—— Bopp = ‘p’ Bom: = 0 dp). 
sai 2m ) ae In+1\ 2m )? 7? é a 


The Wilson quotient 287 


Substituting these congruences into (1.31.4), we obtain 


pol : ss 
; p—1_ if p—1 divides 2n 
Bon = i od 
i 23 fi otherwise noe) 


by Corollary [7.5.2] and the result follows. 


Sums of powers of integers modulo p* 


Assume, for convenience, that p > 3. By Theorem [0.JJand (0.6.1) we have 
p-l 1 k-1 k 
al; ‘ee Bp 
(7.31.5) sett at (*) r 


If p does not divide (r,k), then it does not divide the denominator of 1(*), and so 

the rth term is then = 0 (mod p?) whenever k —r > 4, by the von Staudt-Clausen 

Theorem. Otherwise p’ divides 2(*) B,p*-? where = k —r—1-—e and p°||(k,r) 

with e > 1. Now p® divides k —r > 0 and so > p®° —-e—1 > 5°-e-1>3. 

Therefore if m= k—-—1> 2, then 
p-l 


m 
So n™= Bm p » Bm-1P" + 


mim — 1 
mn OB 2 oa (mod p). 
n=0 
A similar analysis would allow us to extend this to modulo any power of p. We 


deduce from this that if p > 3 and m > 2, then 


p-l 2 7 
Bun d f 1 dp-—1), 

Gp Sta PB eee ee? eee) 
= —mp/2 (mod p*) ifm=1 (mod p-—1), 
since 4 B,_1p” =0 (mod p?) unless p—1 divides m—1, by the von Staudt-Clausen 
Theorem, in which case B,, = 0 and pB,,-1 = —1 (mod p). 

This improves on Corollary [7.5.2] Note that (7.31.6) is = 0 (mod p?) if m is 
odd and m #1 (mod p — 1), though this is more easily proved in exercise [7.10.12] 
It is also = 0 (mod p”) when the exponent m = p. 


Exercise 7.31.2. Prove that 


=i e 
3 vatii= 1/2 (mod p) ifk=1, 
ary Bp-14%— Be (modp) if2<k<p-l. 


The Wilson quotient 


Wilson’s Theorem states that p divides (p—1)!+1. We define the Wilson quotient, 

_ (p-V!+1 

a 

By the additivity property, (7.31.2), of Fermat quotients we obtain 
EltpePo at 


>> g(n) = ap((p — 1)!) = - =—(p—1)wp = wp (mod p). 


288 Appendix 7H. Further congruences 


Therefore 
p-l p-l 
p—1+pwp = > (1+ pq(n)) = 5) n?* =pB,1 (mod p*), 
n=1 n=1 


by (231.6) (and remember that the Von Staudt-Clausen Theorem implies that 
pBp-1 =—1 (mod p)). We obtain the surprising connection: 


_ pBy-1 — (p—1) 


7.31.7 w 
( ) P g 


(mod p). 


Exercise 7.31.3. Show that if p > 3, then TES; (ap+j) = (p—1)! (mod p?) for every integer a. 


Beyond Fermat’s Little Theorem 


Let p be an odd prime. We have seen that eae —n) = x2? — x (mod p), but 
what about mod p?? We define 


p-1 p-1 k 
Ly := — log (To-»») ->/ “) = 


n=1 k>1 \n= 
We will restrict our attention to k < p— 1 (as the original polynomial has degree 
p—1), and so (231.6) implies that 


p-1l k 
x 
Lp = 5 PBr-- (mod (p*, x?)). 
k=1 


Therefore Z, = 2?! (mod (p,x?)) and so Z7 =0 (mod (p*, x”)). Therefore 


p-1 k 
- a 
[[c nz) =e? =1-T,=1 DPB (mod (p”, x? )). 


Since the left-hand side has degree p — 1, the two sides are congruent mod p?. 
Replacing x by 1/x, multiplying through by x?, and using (7.31.7), we deduce that 


p—-l pol m 
x 
(7.31.8) | | (c-—n)= x? —x+p (we + ) Bn) (mod p”). 


n=0 m=2 


Exercise 7.31.4. Prove that 


p-l p-l1 
II (a — n) II (2 +n) = pa?! (mod p?) 
n=0 n=0 


in two ways. First deduce it from (7.31.8) and then prove it by substituting in = 0,1,...,p—1. 


Reference for this section 


[1] Emma Lehmer On Congruences Involving Bernoulli Numbers and the Quotients of Fermat and 
Wilson, Ann. of Math. 39 (1938), 350-360. 


Bernoulli numbers 289 


7.32. Frequency of p-divisibility 
Fermat quotients 


We do not know much about the value of g,(2) (mod p). Our best guess is that 
if we do not understand a sequence of mathematical objects and do not see any 
algebraic structure to prejudice it one way or another, then it must be “randomly 
distributed”, at least with the correct interpretation of what “random” means. In 
this case, we guess that q,(2) is as likely to be in any given residue class mod p as 
any other, in particular 0 (mod p), and therefore the “probability” that q,(2) = 0 
(mod p) is roughly 1/p. This is a meaningless statement but can be reinterpreted 
much like the Gauss-Cramér model for primes discussed in section [5.15] of appen- 
dix 5C. So let Xo, X3, X5,X7,... be a sequence of independent random variables 
indexed by the primes, with Prob(X, = 1) = : and Prob(X, = 0) =1- We 
think of the sequence £2, x%3,... with x» = 1 whenever p divides q,(2), and x, = 0 
otherwise, as being typical of the sequences in this probability space. Therefore we 
expect that 


1 
#{p <x: p divides qg,(2)} is roughly E Xp) = (Xp) = = 
which + oo by (6.12.3), growing like log log x by (6.12.4). This suggests there are 
infinitely many primes p for which p divides g,(2) but they should be extremely 
rare. There are just two examples known with p < 7.2 x 10!°, so when should we 
expect to find the next example? Our best guess is that it should be < x where 
log log « — log log(7.2 x 101°) is roughly 1; that is, x ~ 104%. This is so large that we 
may never be able to compute another example, even if examples exist as frequently 
as expected! 


Bernoulli numbers 


How frequently are the Bernoulli numbers B2,, divisible by a given prime p, with 
2<2n<p- 1 Again we have little idea but reams of data suggest that the 
Boy, (mod p) are distributed much like random numbers mod p. If this is so, then 
we can guess that for a random large prime p and a random n, the probability that 
p divides (the numerator of) Bz, is 1/p. If these probabilities are “independent” , 
then the “probability” that p divides none of these numerators is 


p-3 


1 “227 - 
(1 - ) —+p-soo & 1/? = 0.6065306597 .... 
p 


Calculations with Bernoulli numbers suggest that this is about right, though we 
have absolutely no idea how to prove this. This analysis also suggests that the 
number of B2,, divisible by p should behave like a Poisson variable with parameter 


5. In other words we can guess that for each m > 0, the proportion of primes p for 
—1/2 


which p|B2, for exactly m values of 2n with 2 < 2n <p—3 is 5]. 


17Tn a famous early paper in algebraic number theory Kummer showed that a class number related 
to cyclotomic fields is divisible by p if and only if one of these Bernoulli numbers is divisible by p. 


Appendix 7/l. Primitive prime 
factors of recurrence 
sequences 


7.33. Primitive prime factors 


Prime p is a primitive prime factor of a™ — 1 if p divides a™ — 1 but does not 
divide a” — 1 for any 1 < r < m—1. In other words a has order m mod p and so 
p=1 (mod m), by Theorem [7.1] Moreover p divides ¢(a) but not ¢,(a) for any 
1<r<m-1. (Here the ¢,(x) are the cyclotomic polynomials defined in section 
of appendix 4E and used in section 7.9.) 


Proposition 7.33.1. Ifa has order m mod p, then p\¢n,(a) if and only if n/m is 
a power of p. Moreover ifn #m, then p? { dn(a), except when a = 3 (mod 4) with 
m=1andn=p=2. 


Proof. Now (m,p) = 1 as m divides p— 1. If plén(a), which divides a” — 1, 
then m|n by Lemma We now prove that n/m is a power of p. If not, then 
there exists a prime g # p which divides n/m. Then m divides n/q 50 P divides 
a"/4 — 1, which divides a” — 1. By Lemma[Z.17.Ila) we deduce that 
divisible by p. But ¢,,(a) is a factor of 
a contradiction. 

Now let n = mp*. By Proposition [7.17.1] we have that p divides 
p? (except perhaps if p = 2, and 4 divides a”/? + 1). Moreover 


weal is not 


aa, and so it cannot be divisible by p, 


Oy 5 but not 


a”™ —1 qm =i 
anip—1 gm =t = I Pape (a 
d|m 


290 


Prime power divisibility of second-order linear recurrence sequences 291 


and m+ dp* for d < m, so p divides @,(a) but not p? (except perhaps if p = 2, n/2 
is odd, and a = 3 (mod 4)). In this last case, a must have order 1 (mod 2), and so 
m=1,n=2. 


Corollary 7.33.1. Let n be an integer > 2. If prime p divides n(a)/(n, bn(a)), 
then a has order n mod p. In particular if |dn(a)| > n, then there exists a prime 
p=1 (mod n) that is a primitive prime factor of a” — 1. 


Proof. If a has order m mod p where m <n, and p divides ¢;,(a), then n = mp* 
for some k > 1, by Proposition [7.33.1] Moreover, p? does not divide ¢p(a) and so 
p divides (n,@n(a)), so that p does not divide ¢,(a)/(n, én(a)). Therefore a has 
order n (mod p). 


Now (n, dn(a)) < n and so if |¢,(a)| > n, then |dp(a)|/(n, dn (a)) > 1. There- 
fore such a prime p exists and must be = 1 (mod n). 


Exercise 7.33.1. (a) Use exercise [16.4] to prove that if ja] > 2, then |¢n(a)| > a |al%™ 
where a = a(a) :=J],5,(1—1/|a|*). Note that a(a) > a(2) = .288788.... 
(b)t Deduce that a” — 1 has a primitive prime factor for every integer a 4 —1,0,1 andn> 1, 
except for the special cases n = 1, a = 2; or n = 2, a = —1 + 2* for some integer k; or 
n=3,a=-2;orn=6,a=2. 


We have therefore proved the following: 


Corollary 7.33.2. [fa is an integer > 1, then a” —1 has a primitive prime factor 
for alln > 1, except for 2'—1, (2* —1)? —1, and 2®—1. Moreover a" +1 has a 
primitive prime factor for alln > 1 except 2? +1. 


Proof. The first part follows from exercise [7.33.1(b). Now a” + 1 has the same 
primitive prime factors as a2” — 1, so the only possible exceptions correspond to 
the even exponent cases in the first part, leading to just the one example. 


One can also deduce Theorem [7.8] 


Prime power divisibility of second-order linear recurrence sequences 


The next exercise gives a generalization of Proposition[7.17.1]to second-order linear 
recurrence sequences: Let {xz : m > O} be the second-order linear recurrence 
sequence given by 


Ln = AXyn_1 + bXyn_2 for all n > 2 with xv = 0 and x; = 1. 


Exercise [2.5.19] implies that, for any given m > 2, @m divides rom, 23m,.-... Let 
Xk := Lkm/Lm be the sequence of quotients. Applying exercise [0.7.3(c) (with m 
replaced by (k — 2)m and n by m), Xz starts Xo = 0, X, = 1 and then 

Xp = YmXk—-1 + ZmX k—2 for all k = 2, 
where 

x? — Ym — 2m = (« — a) (a — B™). 

Exercise 7.33.2.! Suppose that (a,b) = 1. Let Am denote the discriminant of the minimum 
polynomial for (X_)%>0- 


(a) Prove that ym, zm, and Am are pairwise coprime. 
(b) Show that Am = A?2?,. 


292 Appendix 7I. Primitive prime factors of recurrence sequences 


c) Let mq be the smallest positive integer for which d|xm. Now let m = my for some prime p, 
g Pp 


and suppose that p* \lam, so that mp = Mp2 = 11+ = Mpk- Prove that Mpk+1 = PMp and, 


in general, Mykt+j = pimp for all j > 0. 


Exercise 7.33.3.1 Suppose that (a,b) = 1 and that odd prime p divides A. 
(a) Prove that zn = n(a/2)"—! (mod A), and deduce that mp = p. 
(b) Show that if p > 3, then am = n(a/2)"~1 + nln— Wn?) 72 A(a/2)"-3 (mod A?) for all 
n> 3. 
(c) Deduce that p divides xp but p? does not. 


7.34. Closed form identities and sums of powers 


In section [0.2] we began our study of closed form expressions for sums of kth powers 
and observed the remarkable identity 


s3(N) = s1(N)? 


for all N > 1 where s,(N) = 1* +2" +---+.N*. Moreover we asked whether there 
are any other such identities. We now prove that there are not. 


Corollary 7.34.1. Suppose there is an identity r[]i", 8;,(N) = Ril 1 $u1(N) 
for some integers r, R,j1,---,Jjm;J1,---;Ju > 1, where all the 3; are different from 
the Jr, and (r,R) = 1. Then the identity must be of the form s3(N)* = s,(N)?¢ 
for some integer €> 1. 


Proof. By changing the order if necessary we may assume that Jy, > Jy, 9; for 
all I < M andi < m. By Theorem [0.i] and (0.6.1), the main term of s;(N) is 
N**1/(k +1). Therefore, comparing the coefficient of the leading term on both 
sides of our identity and multiplying through by denominators, we deduce that 


M 
r[[(+) = aT[Gi +0. 
I=1 


which implies that r divides [])”, (ji + 1). 

Let N = 2 in our identity and suppose that p is a primitive prime factor of 
8 Jn (2) = 27" +1. Then p divides r[]j"., s;,(2), but p cannot divide any s;,(2), 
and so p divides r, which divides []7",(j; + 1). But p = 1 (mod 2J)7) so that 
p > 2Jy +1, but this yields a contradiction since it implies that p cannot divide 
any j; + 1. Hence 27“ + 1 does not have a primitive prime factor which implies 
that Jy = 3 by Corollary [7.33.2] 


Therefore any identity involves only s1(NV), se(NV), and s3(V). Now s2(N) has 
a factor 2N — 1 which is coprime with N(N — 1) and so s2(V) cannot be involved 
in any identity. Comparing powers of N(N — 1)/2 we deduce that only possible 
identities take the form s3(N)* = s1(N)?° as claimed. 


7.35. Primitive prime factors and second-order linear recurrence 
sequences 


The Bang-Zsigmondy Theorem (1892) states that if a and b are coprime positive 
integers, then a” — b” has a primitive prime divisor for all n > 1 except for the 
special cases n 6, a 2, b 1, and n = 2,a+b = 2*. It can be proved 


7.35. Primitive prime factors and second-order linear recurrence sequences 293 


by elaborating on the ideas in section One can also prove such results for 
second-order linear recurrence sequences that begin 0, 1: 


Let zo = O and x = 1, and let t, = axry_1 + beyn_2 for all n > 2. Let 
A := a? + 4b. 

When A > 0, then it is easy to show that the x, grow exponentially fast 
(which is the key to the proof in exercise [7.33.1], since z, = (a” — 8")/(a— B) 
with ja] > 1,|G|. Using this, Carmichael showed in 1913 that if A > 0, then z,, has 
a primitive prime factor for each n 4 1, 2, or 6 except for F\z = 144 where F,, is 
the Fibonacci sequence, and for F’, = (—1)""!F). 

The case with A < 0 is much more difficult. Nonetheless, in 1974 Schinzel 
succeeded in showing that x, has a primitive prime factor once n > no, for some 
sufficiently large no, other than in the periodic case a = +1, b = —1. Determining 
the smallest possible value of no has required great efforts culminating in the beau- 
tiful work of Bilu, Hanrot, and Voutier [1] who proved that ng = 30 is best possible. 
Indeed, they show that such examples occur only for n = 5,7,8,10, 12, 13,18, 30: 
Ifa =1, b = —2, then x5, xg, 12, 213, Y1g, X39 have no primitive prime factors; 
ifa=1, b= —5, then x7 = 1; if a = 2, b = —3, then 29 has no primitive prime 
factors; there are a handful of other examples besides, all with n < 12. 


It is conjectured that each x, with n sufficiently large (say n > no) has a 
primitive prime factor which divides x, to the power 1. This would have several 
interesting applications, for example, to showing that no Fibonacci number after 
12? is a square and that, for any integer d, the equation F,, = dx? has at most one 
solution with n > no. 


Exercise 7.35.1. Assume that if n > no, then x» has a primitive prime factor which divides xn, 
to the power 1 (that is, p divides zn but not p?). 
(a) Show that the equation x, = y? has no solutions with n > no. 
(b) Show that, for any integer d > 1, the equation rn = dy has at most one solution with 
n>no- 
(c) Show that there are no finite sequences no < n1 < n2 <--+ < nr such that Ln Lng --.En 
is a square. 
(d) Make the same deductions assuming that if n > no, then xn has a primitive prime factor 
which divides xp, to exactly an odd power. 


r 


Recently it was shown in [2] that the assumption in (d) is valid for 7, = 2”-—1 
with no = 6, and even for any second-order linear recurrence sequence with (a, b) = 
1, b=2 (mod 4), A > 0, and no = 6. It is an open question whether this result can 
be extended to other recurrence sequences, the most interesting being the Fibonacci 
numbers. 


It seems laborious that we have to make the assumption that x, has a primitive 
prime factor which divides it to exactly an odd power, rather than to the power 1, 
but it is feasible (though unlikely) that this latter assumption is untrue: 


Lemma 7.35.1. Assume that p? divides 2?-! —1 for all primes p > no. If q is a 
primitive prime factor of t, = 2" —1 withn > no, then q? divides x,. (The result 
in [2] then guarantees that some such q divides x, to an odd power > 1.) 


294 Appendix 7I. Primitive prime factors of recurrence sequences 


In section[7.32]of appendix 7H, we discussed that we believe that it is extremely 
rare that p? divides 2?—! — 1; however we cannot prove that it does not happen for 
all sufficiently large p, as in the hypothesis of the lemma. 


Proof. If q¢ is a primitive prime factor of x, = 2” — 1, then ord,(2) = n which 
divides q— 1, and sog > n+1 > no. By the hypothesis q? divides 24~! — 1. 
Therefore n divides gq — 1, and so 2” — 1 is divisible by q to the same power as 
27-1 _ 1, by Lemma[7.17.1{a). Therefore q? divides zp. 


References for this chapter 
[1] Y. Bilu, G. Hanrot, and P. Voutier, Existence of primitive divisors of Lucas and Lehmer numbers, 
J. Reine Angew. Math. 539 (2001), 75-122. 


[2] Andrew Granville, Primitive prime factors in second-order linear recurrence sequences, Acta Arith- 
metica 155 (2012), 431-452. 


[3] A. Schinzel, Primitive divisors of the expression A” — B” in algebraic number fields, J. Reine 
Angew. Math. 268/269 (1974), 27-33. 


Chapter 8 


Quadratic residues 


In this chapter we will develop an understanding of the squares mod n, in particular 
how many there are and how to quickly identify whether a given residue is a square 
mod n. We mostly discuss the squares modulo primes and from there understand 
the squares mod prime powers via “lifting”, and modulo composites through the 
Chinese Remainder Theorem. 


8.1. Squares modulo prime p 


There are two types of squares mod p. We always have 0? = 0 (mod p). Then 
there are the “quadratic residues (mod p)”, which are the non-zero residues a 
(mod p) which are congruent to a square modulo p. All other residue classes are 
“quadratic non-residues”. If there is no ambiguity, we simply say “residues” and 
“non-residues”. In the next table we list the quadratic residues modulo each of the 
primes between 5 and 17. 


Modulus Quadratic residues 
5 1,4 
7 1, 2,4 
11 1, 3, 4, 5,9 
13 1, 3, 4, 9, 10, 12 
17 1, 2, 4, 8, 9, 13, 15, 16 


Exercise 8.1.1. (a) Prove that 337 is not a square (that is, the square of an integer) by 
reducing it mod 5. 
(b) Prove that 391 is not a square by reducing it mod 7. 
(c) Prove that there do not exist integers x and y for which x? — 3y? = —1, by reducing any 
solution mod 3. 


In each row of our table there seem to be at quadratic residues mod p: 


Lemma 8.1.1. The distinct quadratic residues mod p are given by 17,27,..., es: 
(mod p). 


Proof. If r? = s* (mod p) with 1 < s<r<p-—1, then p|r*—s? = (r—s)(r+s) 
and so p divides either r— s or r+s. Now 0 <r—s <p and so p does not divide 


295 


296 8. Quadratic residues 


r —s. Therefore p divides r+ s, and 0 <r+s < 2p, so we must have r+ s = p. 


Hence the residues of 17, 2?,..., ay (mod p) are distinct, and if s = p—r, then 


s? =(-r)? =r? (mod p). This implies our result. 


Define the Legendre symbol as follows: For each odd prime p let 
0 ifa@=0 (mod p), 
a 
(<) = 1 if ais a quadratic residue mod p, 
—1 if ais a quadratic non-residue mod p. 


Exercise 8.1.2. (a) Prove that if a= b (mod p), then (3) — (2). 


p 
-1 
(b) Prove that S>?75 (3) = 0. 


Corollary 8.1.1. There are exactly 1+ (3) residues classes b (mod p) for which 
b? =a (mod p). 

Proof. If a is a quadratic non-residue, there are no solutions. For a = 0 if b? = 0 
(mod p), then b = 0 (mod p) so there is just one solution. If a is a quadratic 
residue, then, by definition, there exists b such that b? = a (mod p), and then there 
are the two solutions (p — b)? = b? = a (mod p) and no others, by the proof in 
Lemma [8.1.1] (or by Proposition [7.4.1). We have therefore proved 


1 ifa=0 (mod p), 
#{b (mod p): b?=a (mod p)}= 42 if ais a quadratic residue mod p, 


0 if ais a quadratic non-residue mod p. 


This equals 1 + (3), looking above at the definition of the Legendre symbol. 


Theorem 8.1. We have (2) = (3) (4) for any integers a,b. That is: 


(i) The product of two quadratic residues (mod p) is a quadratic residue. 
(ii) The product of a quadratic residue and a non-residue is itself a non-residue. 


(iii) The product of two quadratic non-residues (mod p) is a quadratic residue. 


Proof (Gauss). (i) If a= A? and b= B?, then ab = (AB)? (mod p). 

Let R := {r (mod p) : (r/p) = 1} be the set of quadratic residues mod p. We 
saw that if (a/p) = 1, then (ar/p) = 1 for all r € R. In other words, ar € R; that 
is, aR C R. The elements of aR are distinct, so that |aR| = |R|, and therefore 
aR = R. 


(ii) Let N = {n (mod p) : (n/p) = —1} be the set of quadratic non-residues 
mod p, so that NU R partitions the reduced residues mod p. By exercise [3.5.2 
we deduce that aR UdaWN also partitions the reduced residues mod p, and therefore 
aN = N since aR = R. That is, the elements of the set {an : (n/p) = —1} are all 
quadratic non-residues mod p. 


By Lemma[B.L1] we know that |R| = 85+, and hence |N| = ®>+ since NUR 


partitions the p — 1 reduced residues mod p. 


8.2. The quadratic character of a residue 297 


(iii) In (ii) we saw that if (n/p) = —1 and (a/p) = 1, then (na/p) = —1. Hence 
nR CN and, as |nR| = |R| = 4+ = |N|, we deduce that nk = N. But nRUnN 
partitions the reduced residues mod p, and sonN = R. That is, the elements of 
the set {nb: (b/p) = —1} are all quadratic residues mod p. 


Exercise 8.1.3. Suppose that prime p does not divide ab. 
(a) Prove that (24) = (2). 
(b) Prove that there are non-zero residues x and y (mod p) for which ax? + by? = 0 (mod p) if 


and only if (=2") =1, 
Exercise 8.1.4. Prove that if odd prime p divides b? — 4ac but neither a nor c, then (2) = (¢). 


Exercise 8.1.5. Let p be a prime > 3. Prove that if there is no residue « (mod p) for which 
x? = 2 (mod p), and no residue y (mod p) for which y? = 3 (mod p), then there is a residue z 
(mod p) for which z? = 6 (mod p). 


We deduce from Theorem[8. I]that ( : ) is a multiplicative function. Therefore if 


Pp 
we have a factorization of a into prime factors as a = tq}'q5? ...q;,", and (a, p) = 1, 
ther] 
a +1 k di ej +1 k di 
=e A = 4 
(5) ee & Il (“). 
e; odd 
e 


since (q/p)? = 1 whenever p { q as this implies that («) "= 1ife; is even, and 


(+) = (+) if e; is odd. Therefore, in order to determine (s) for all integers a, 


it is only necessary to know the values of (s+), and of (2) for all primes q. 


Exercise 8.1.6. One can write each non-zero residue mod p as a power of a primitive root g. 
(a) Prove that the quadratic residues are precisely those residues that are an even power of g, 
and the quadratic non-residues are those that are an odd power. 


(b) Deduce that (2) =—1. 


Exercise 8.1.7. (a) Show that if n is odd and p divides a” — 1, then (2) =: 
(b) Show that if n is prime and p divides a” — 1, but a#1 (mod p), then p=1 (mod n). 
(c) Give an example to show that (b) can be false if we only assume that n is odd. 


Exercise 8.1.8. (a) Prove that, for every prime p # 2,5, at least one of 2, 5, and 10 is a 
quadratic residue mod p. 
(b)? Prove that, for every prime p > 5, there are two consecutive positive integers that are both 
quadratic residues mod p and are both < 10. 


8.2. The quadratic character of a residue 


Fermat’s Little Theorem (Theorem [7.2] states that the (p — 1)st power of any 
reduced residue mod p is congruent to 1 (mod p). Are there other patterns to be 
found among the lower powers? 


1Bach of “4” and “+1” is to be read as “either ‘+’ or ‘—’”. We deal with these two cases together 
since the proofs are entirely analogous, taking care throughout to be consistent with the choice of sign. 


298 8. Quadratic residues 


a a? a? at a? a® 
a a’ a? a* 1 1 1 1 1 1 
1 1 1 l 2 -3 1 2 -3 1 
9 1 9 l 3 2 -1 -3 -2 1 
2 1 9 1 -3 2 1 -3 2 1 
| 1 4 1 -2 -3 -1 2 3 1 

-1 1 -1 1 -1 1 

The powers of a mod 5 The powers of a mod 7 


As expected the (p — 1)st column is all 1’s, but there is another pattern that 
emerges: The entries in the “middle” column, that is, the a? column mod 5 and the 
a® column mod 7, are all —1’s and 1’s. This column represents the least residues of 
numbers of the form a2" (mod p), and it appears that these are all —1’s and 1’s. 
Can we decide which are +1 and which are —1? For p = 5 we see that 17 = 47 = 1 
(mod 5) and 2? = 3? = —1 (mod 5); recall that 1 and 4 are the quadratic residues 
mod 5. For p = 7 we see that 13 = 23 = 43 = 1 (mod 7) and 33 = 5° = 6? = -1 
(mod 7); recall that 1, 2, and 4 are the quadratic residues mod 7. So we have 
observed a pattern: The ath entry in the middle column is +1 if a is a quadratic 
residues mod p, and it is —1 if a is a quadratic residues mod p; in either case it 


equals the value of the Legendre symbol, (z). This observation was proved by 
Euler in 1732. 


Theorem 8.2 (Euler’s criterion). We have gr = (s) (mod p) for all primes p 


and integers a. 


Proof #1. If (2) = 1, then there exists b such that b? = a (mod p) so that 
az =pP-t=1 (mod p), by Fermat’s Little Theorem. 

If (4) = —1, then we proceed as in Gauss’s proof of Wilson’s Theorem though 
pairing up the residues slightly differently. Let 

S={(r,s): l<r<s<p-—1l1, rs=a (mod p)}. 

Note that if rs = a (mod p), then r # s (mod p), or else a = r? (mod p), contra- 
dicting that (2) =-1. Therefore each integer m, 1 < m < p—1, appears exactly 
once, in exactly one pair in S. We deduce that 


(p-1)!= II rs = all =a*e (mod p), 
(r,s)ES 


and the result follows from Wilson’s Theorem. 


For example, for p = 13,a = 2 we have 
—1=12! = (1- 2)(3-5)(4-7)(6-9)(8-10)(11-12) = 2° (mod 13). 


Exercise 8.2.1.' Prove Euler’s criterion for (a/p) = 1, by evaluating (p — 1)! (mod p) as in the 


second part of proof #1, but now taking account of the solutions r (mod p) to r? = a (mod p). 


8.2. The quadratic character of a residue 299 


Proof #2 of Euler’s criterion. We began Proof #1 by showing that if (s) =1, 
then a’? =1= (s) (mod p). This oe that a is a root of x°= —1 (mod p). 
By LemmaJ8.1.I]there are ey S- 1 quadratic residues mod p, and we now know 


that these are all roots of 2°= — L (mod p) and are therefore all of the roots of 
ar 1 (mod p). That is, 


(8.2.1) x? -1l= II (a—a) (mod p). 
l<a<p 
(a/p)=1 
In (7.4.1) we noted that 


2?! —1= (e—1)(e—2)---(e—(p—1)) (mod p); 


that is, the p— 1 roots of a?-'-1= (a= - 1)(2*= +1) (mod p) are precisely 
the reduced residues mod p, each occurring exactly once. Since the set of reduced 
residues mod p is the union of the set of quadratic residues and the set of quadratic 
non-residues, we can divide this last equation through by (8.2.1), to obtain 


(8.2.2) or +1 II (a—b) (mod p). 
1<b<p 
(b/p)=-1 


This implies that if b is a quadratic non-residue mod p, then ba tlSt (mod p); 
that is, b°= =—1= (2) (mod p). 


We can use Euler’s criterion to determine the value of Legendre symbols as 
follows: (3) = 1 since 3° = 27? = 1? = 1 (mod 13), and (4) = —1 since 
2° = 64 = —1 (mod 138). 


Exercise 8.2.2. Let p be an odd prime. Explain how one can determine the integer (¢) by 


at 
knowing a> (mod p). (Euler’s criterion gives a congruence, but here we are asking for the 
value of the integer (),) 


Exercise 8.2.3. Use Euler’s criterion to reprove Theorem 


Proof #3 of Euler’s criterion. Let g be a primitive root mod p. We have 
gt = —1 (mod P) by exercise [7.5.2] Suppose that a = g 
integer r, so that a> = (g")= = (g°=)" = (-1)" (mod p). If a is a quadratic 

r= (—1)" =1 (mod p). 
If a is a quadratic non-residue mod p, fren r is odd, and so gr = (-1)"=- 


(mod p). 


™ (mod p) for some 


residue mod p, then r is even by exercise 6] and so a2 


Square roots and non-squares modulo p. How can we tell whether a reduced 
residue a (mod p) is a square mod p? One idea is to try to find the square root, but 
it is not clear how to go about this efficiently (for example, try to find the square 
root of 77 (mod 101)). One consequence of Euler’s criterion is that one does not 
have to try to find the square root to determine whether a given residue class is a 
square mod p. Indeed one can determine whether a is a square mod p by calculating 


300 8. Quadratic residues 


a's (mod p). This might look like it will be equally difficult, but we have shown 
in section 7.13 of appendix 7A that one can calculate a high power of a mod p quite 
efficiently. 


There are some special cases in which one can determine a square root of a 

(mod p) quite easily. For example, when p = 3 (mod 4): 
ptl 

Exercise 8.2.4. Let p be a prime = 3 (mod 4). Show that if (¢) =landb=a‘* (modp), 
then b? =a (mod p). (This idea is explored further in section of appendix 7C.) 

However if p = 1 (mod 4), then it is not so easy to determine a square root. 
For example, —1 is a square mod p (as we will prove in the next section) but we do 
not know a simple practical way to quickly determine a square root of —1 (mod p). 


How can one quickly find a quadratic non-residue mod p? One would think it 
would be easy, as half of the residues mod p are quadratic non-residues, but there 
is no simple way to guarantee finding one quickly. In practice it is most efficient 
to select numbers in [1,p — 1] at random, independently. The probability that any 
given selection is a quadratic residue is $3 so the probability that every one of 
the first k choices is a quadratic residue is 1/2". Therefore, the probability that 
none of the first 20 selections is a quadratic non-residue mod p is less than one 
in a million. Moreover it is easy to verify whether each selection is a quadratic 
residue mod p, using Euler’s criterion. This algorithm will almost always rapidly 
determine a quadratic non-residue mod p, but one might just be terribly unlucky 
and the algorithm might fail. 


It is useful to determine for which primes p a given small integer a is a quadratic 
residue (mod p). We study this for a = —1, 2, and —2 in the next few sections. 


8.3. The residue —1 


Theorem 8.3. If p is an odd prime, then —1 is a quadratic residue (mod p) if and 
only if p=1 (mod 4). 


We will give five proofs of this result (even though we don’t need more than 
one!) to highlight how the various ideas in the book dovetail in this key result. It 
is worth recalling that in exercise [7.4.3[c) we showed that if p = 1 (mod 4), then 
(2)! is a square root of —1 (mod p). We developed more efficient ways of finding 
a square root of —1 (mod p) in section 7.21 of appendix 7C. 


Proof #1. Euler’s criterion implies that (=) = (—1)*= (mod p). Since each 
side of the congruence is —1 or 1, and p, which is > 2, divides their difference, they 


must be equal and so ($) = (—1)**, and the result follows. 


Proof #2. In exercise[7.5.2] we saw that —1 = g-)/? (mod p) for any primitive 
root g modulo p. Now if —1 = (g*)? (mod p) for some integer k, then po = 2k 


(mod p — 1), and there exists such an integer k if and only if Bet is even. 


8.4. The residue 2 301 


Proof #3. The number of quadratic non-residues (mod p) is 25+ 


ee 
Wilson’s Theorem, we have 


G)=(5")= I G)=00" 


a (mod p) 


and so, by 


Proof #4. If a is a quadratic residue, then so is 1/a (mod p). Therefore we may 
“pair up” the quadratic residues (mod p), except those for which a = 1/a (mod p). 
The only solutions to a = 1/a (mod p) (that is, a2 = 1 (mod p)) are a = 1 and 
—1 (mod p). Therefore the product of the quadratic residues mod p is congruent 
to —(—1/p). On the other hand the roots of oer —1 (mod p) are precisely the 
quadratic residues mod p, and so, taking x = 0 in (8.2.1), the product of the 


2S 


quadratic residues mod p is congruent to (—1)(—1)"= (mod p). Comparing these 
yields that (—1/p) = es (mod p), and the result follows. 


Proof #5. (Euler) The first part of Proof #4 implies that 


—l 
— = #{a (mod p) : a is a quadratic residue (mod p)} 


has the same parity as 


1 —1 
#{a € {1,-1}: ais a quadratic residue (mod p)} = 5 (s + (=)) : 
Pp 


Multiplying through by 2 yields p = (+) (mod 4), and the result follows. 


Theorem [8.3] implies that if p= 1 (mod 4), then (+) = (z); and if p= —1 
(mod 4), then (51) =— (z). 
Pp P 
Exercise 8.3.1. Let p be a prime = 3 (mod 4), which does not divide integer a. Prove that either 


there exists « (mod p) for which x? = a (mod p) or there exists y (mod p) for which y? = —a 
(mod p), but not both. 


Exercise 8.3.2. (a) Prove that every prime factor p of 4n? + 1 satisfies p = 1 (mod 4). 
(b) Deduce that there are infinitely many primes = 1 (mod 4). 


8.4. The residue 2 
Calculations reveal that the odd primes p < 100 for which (2) = 1 are 


p = 7,17, 23, 31, 41, 47, 71, 73, 79,89, and 97. 


These are exactly the primes < 100 that are = +1 (mod 8). This observation is 
established as fact as follows: 


Theorem 8.4. If p is an odd prime, then 


(=) - 1 ifp=lor—1 (mod 8), 
D —-l1 ifp=3 or —3 (mod 8). 


302 8. Quadratic residues 


Proof. We will evaluate the product 


S:= II m (mod p) 


1<m<p-1 
m even 


in two different ways. First note that each m in the product can be written as 2k 
with l1<k< po and so 


“Tepeoe (2) 
ee ( ; ). 


One can also rewrite each m in the product as p—n where n is odd; and if m is in 
the range pet <m<p-1,thenl<n< —s Therefore 


S= II m- II (p—n). 


1<m< 2+ 1<n<®p 
m even n odd 
Let’s suppose there are r such values of n, and note that each p—n = —n (mod p). 
Therefore 
p-1l 
S= m-: n) = (-1)" ! (mod p). 
iat T= (25+)! nea ») 
1<m<2 1<n< 2 
m even nm odd 


Comparing the two ways that we have evaluated S$, and dividing through by (#5*)!, 
we find that a 

2°= =(-1)" (mod p). 

The result follows from Euler’s criterion and verifying that r is even if p = +1 
(mod 8), while r is odd if p= +3 (mod 8) (see exercise [8.4.1). 


Exercise 8.4.1. For any odd integer q, let r denote the number of positive odd integers < aS. 
Prove that r is even if g= +1 (mod 8), while r is odd if g= +3 (mod 8). 


Gauss’s Lemma (Theorem [8.6] in appendix 8A) cleverly generalizes this proof 


of Theorem [8.4] to classify the values of (s) for any fixed integer a. 


Calculations reveal that the odd primes p < 100 for which (=) = 1 are 
p = 3, 11,17, 19, 41, 43, 59, 67, 73, 83, 89, and 97. 


These are exactly the primes < 100 that are = 1 or 3 (mod 8). This observation is 
established as fact by combining Theorems [8.3] and [8.4] which allow us to evaluate 


(=2) by taking (=2) = (+) (2) for every odd prime p. 
Exercise 8.4.2. Prove that if p is an odd prime, then 
(2)-{2 ifp=1lor3 (mod 8), 
Pp -1 ifp=5or7 (mod 8). 
Exercise 8.4.3. Prove that if 2 is a primitive root mod p, then p = 3 or 5 (mod 8). 


Exercise 8.4.4.1 (a) Prove that if prime p|M, := 2" — 1 where n > 2 is prime, then p = 1 
(mod n) and p= +1 (mod 8). 
(b) Prove that if p = 2n+ 1 is prime, then p|2” — 1 if and only if p= +1 (mod 8). 
(c) Prove that if p = 2n + 1 is prime, then p|2” 4+ 1 if and only if p= +3 (mod 8). 


8.5. The law of quadratic reciprocity 303 


(d) Prove that if g and p = 2q+1 are both prime, then p divides 2% — 1 if and only if q = 3 
(mod 4). 
(e) Factor 21! — 1 = 2047. 
Exercise 8.4.5.1 In exercise [7.3.2] we proved that if prime p divides 92" +1, then p= 1 
(mod 2*+1). Now show that p= 1 (mod 2*+?) if k > 2A 
8.5. The law of quadratic reciprocity 


We have already seen that if p is an odd prime, then 
(=) _ ji. ifp=1 (mod 4), 

pj)  )-1 ifp=-1 (mod 4) 
(2) - 1 ifp=lor —1 (mod 8), 
p) )-1 ifp=3or —3 (mod 8). 


To be able to evaluate arbitrary Legendre symbols we will also need the law of 
quadratic reciprocity. 


and 


Theorem 8.5 (The law of quadratic reciprocity). If p and q are given distinct 
odd primes, then 


(2) (2) - 1 ifp=1 (mod 4) or gq=1 (mod 4), 
q pj |=t if p=q=-1 (mod 4). 


These rules, taken together, allow us to rapidly evaluate any Legendre symbol. 
For example, to evaluate (m/p), we first reduce m mod p, so that (m/p) = (n/p) 
where n = m (mod p) and |n| < p. Next we factor n and, by the multiplicativity of 
the Legendre symbol, we can evaluate (n/p) in terms of (—1/p), (2/p) and the (q/p) 
for those primes q dividing n. We can easily determine the values of (—1/p) and 
(2/p) from determining p (mod 8), and then we need to evaluate each (q/p) where 
q < |n| < p. We do this by the law of quadratic reciprocity since (q/p) = +(p/q) 
depending only on the values of p and g mod 4B] We repeat the procedure on each 
(p/q). Clearly this process will quickly finish as the numbers involved are always 
getting smaller. Let us work through some examples. 


111 40 tae ae _ ‘ 
(2) = (2) = (4) (=) as 111=40 (mod 71) and 40 = 2°-5, 


= es iL (4) as 71=-—1 (mod 8) and5=1 (mod 4), 


5 


= (=) =i] as 71=1 (mod 5). 


?We can use this to “demystify” Euler’s factorization of F5: Exercise[8.4.5]implies that any prime 
factor p of Fs must be of the form 128m +1. This is divisible by 3, 5, and 3 for m = 1, 3, and 4, 
respectively, so is not prime. If m = 2, then p = Fy which we proved is coprime with Fs in section 
Finally, if m = 5, then p = 541 is a prime factor of F5. 

°Note that if (2) (2) = 7 (= £1) by the law of quadratic reciprocity, then (4) =n (2). 


P 


304 8. Quadratic residues 


There is more than one way to proceed with these rules: 


111 eee _ 
(32) = (3) (3) as 111 =~31 (mod 71), 


= (-1)-(-1)- (F) as 71=31=-1 (mod 4), 


= a = 3 : = 1 as 71=9= 37? (mod 31) 
oe Cie nee ee a _ , 


A slightly larger example is 
869\  (247\ (13 19). re 311 .(-1)- 
311) ~ \ 311) ~ \ 311 311) | 13 19 
—1 7 = -1-1-(-1) 19 _ —2 =? 
13 19 7 7 


Although longer, each step is straightforward except when we factored 247 = 13x19 
(a factorization which is not obvious for most of us, and imagine how difficult 
factoring might be when we are dealing with much larger numbers). Indeed, this 
is an efficient procedure provided that one is capable of factoring the numbers n 
that arise. Although this may be the case for small examples, it is not practical for 
large examples. We can bypass this potential difficulty by using the Jacobi symbol, 
a generalization of the Legendre symbol, which we will discuss in section 8.7] 


w 
rary 
a 
NS 


In the next subsection we will prove the law of quadratic reciprocity, justifying 
the algorithm used above to determine the value of any given Legendre symbol. 

The law of quadratic reciprocity is easily used to determine various other rules. 
For example, when is 3 a square mod p? This is the same as asking when (3/p) = 1. 
Now by quadratic reciprocity we have two cases: 

e Ifp=1 (mod 4), then (3/p) = (p/3), and (p/3) = 1 when p= 1 (mod 3), so 
we have (3/p) = 1 when p= 1 (mod 12) (using the Chinese Remainder Theorem). 

e If p = —1 (mod 4), then (3/p) = —(p/3), and (p/3) = —1 when p = 
—1 (mod 3), so we have (3/p) = 1 when p = —1 (mod 12) (using the Chinese 
Remainder Theorem). 


We have therefore proved that (3/p) = 1 if and only if p= 1 or —1 (mod 12). 


Exercise 8.5.1. Determine (a) (32); (b) (323); (c) (32); (d) (33); (e) =). 

Exercise 8.5.2. (a) Show that if prime p= 1 (mod 5), then 5 is a quadratic residue mod p. 
(b) Show that if prime p = 3 (mod 5), then 5 is a quadratic non-residue mod p. 
(c) Determine all odd primes p for which (5/p) = —1. 


Exercise 8.5.3. Prove that if p := 2" — 1 is prime with n > 2, then (3/p) = —1. 


m Fm-1 
Exercise 8.5.4.' Suppose that Fy, = 22. +1 with m > 2 is prime. Prove that 37 2 = 


Baa 


5 2 =-1 (mod Fy). 


Exercise 8.5.5.1 (a) Determine all odd primes p for which (7/p) = 1. 
(b) Find all primes p such that there exists (mod p) for which 2x? — 22 — 3 =0 (mod p). 


8.6. Proof of the law of quadratic reciprocity 305 


Exercise 8.5.6. Show that if p and q = p+ 2 are “twin primes”, then p is a quadratic residue 
mod q if and only if g is a quadratic residue mod p. 


Exercise 8.5.7. Prove that (—3/p) = (p/3) for all primes p. 


8.6. Proof of the law of quadratic reciprocity 


Suppose that p < q are odd primes, and let n = pq. Given residue classes a 
(mod p) and b (mod q) there exists a unique residue class r (mod n) for which 
r =a (mod p) and = b (mod q), by the Chinese Remainder Theorem. Let r(a, b) 
be the least residue of r mod n in absolute value and let m(a,b) = |r(a, 6)|, so that 
1 < m(a,b) < n/2, and m(a, b) = r(a,b) or —r(a,b). We claim that 


—1 
{m(at):t<asp-tandrsos 7h = {mits 5 with (mn) = 1}, 


since the two sets both have ¢(n)/2 elements, each such m(a,b) € [1, }) with 
(m,n) = 1, and the m(a, b) are distinct. This last assertion holds or else if m(a, b) = 
m(a’,b’), then r(a,b) = +r(a’,b') (mod n), so that b= +b’ (mod gq). As1<6,b' < 
at this implies that b = b’ so that the sign is “+”, and therefore a = a’ (mod p) 
implying that a = a’. 


Since each m(a,b) = +r(a,b), we deduce that there exists 0 = —1 or 1 such 
that 
(8.6.1) a II r(a,b) = II m(a, b) = II m. 
l<a<p-1 l<a<p-1 1<m<n/2 
1<b< 454 1<b< 4+ (m,n)=1 


We will calculate the two sides in this identity, mod p and mod q, and compare. 
As r(a,b) =a (mod p) the product on the left-hand side of (8.6.1) is 


Il r(a,b) = II II a=(p-—1)! =(-1)> (mod p), 


l<a<p-1 1<p< 421 = l<a<p-1 
1<bs 45+ _ 


using Wilson’s Theorem. We rewrite the right-hand side of (8.6.1), multiplying top 
and bottom by the integers m € [1, 5) that are divisible by q, to obtain 


I m/ ieee 
1l<m<n/2 1<m<n/2 
(m,p)=1 alm 


We partition the m’s in the numerator into intervals of length p, because 


p-l p-1 
II m= [[@+a = II: = (p—1)! = -1 (mod p), 
ip<m<(itl)p j=1 j=l 


(m,p)=1 
by Wilson’s Theorem. Applying this for 0 <i < ae we get a contribution of 


Cies to the numerator. The remaining integers in the numerator contribute 


(p—1)/2 jeld (p—1)/2 Pes 
= ne = = ae i 
II m II ( 5 P + i) = I —— ( 5 ) ! (mod p). 


306 8. Quadratic residues 


On the other hand the m’s in the denominator can be written as gk with 1 <k< 


p-l 
eS and so 


ge NL ee re a oe 


1<m<n/2 1<k<25+ 
q|m ° 


by Euler’s criterion. Cancelling the (7+)! from the numerator and denominator, we 


deduce that the right-hand side of (&:6.1) is = (—1)“= (2) (mod p). Comparing 
our calculation of the left- and right-hand sides of (8.6.1) mod p, we obtain 


(8.6.2) o(-1) = (-1)"F (2) (mod p). 
Pp 
Since both sides are 1 or —1 and are congruent mod p, they must be equal and so 


we deduce that 
a0) 
go=({-). 
Pp 


Next we reduce (8.6.1) mod g. For the right-hand side we proceed entirely 
analogously to how we did mod p, with the roles of p and q reversed and so obtain 


(-1)"F* (2) (mod q). 


For the left-hand side of (8.6.1) mod gq , we note that each r(a,b) = b (mod q), 
so that 


TM v= TE TE = ((232)1)" moan. 


l<a<p-1 l<a<p—1lji<p< q-1 
1<bs 45+ a 


In exercise [7.4.3] we saw (, 77) .) = (1 + (mod q)*| and therefore 
(q—-1)/2 


by Wilson’s Theorem. Therefore 


I] resd)= (54) ))7 screen (mod q). 


l<a<p-1 
1<b< 5* 


Substituting this and the above into (8.6.1) we obtain 
(8.6.3) (2) (a ees Pe a ee (2) (mod q). 
Pp qd 


Again both sides are 1 or —1 and are congruent mod q, so must be equal. Multi- 


plying both sides through by (—1)*= (£) implies that 
P 


*See the solution to exercise[74.3] at the end of the book for a proof. 


8.7. The Jacobi symbol 307 


From here we work through the four cases for p and g mod 4 and deduce the law 
of quadratic reciprocity (Theorem [8.5). 


There are many proofs of the law of quadratic reciprocity, 246 at the last 
count (see the list at http://www.rzuser.uni-heidelberg.de/~hb3/fchrono. 
html). In this chapter’s appendices we present two of the best: the original proof 
due to Gauss and an elegant proof due to Eisenstein. We also discuss two other 
proofs in the exercises and then two sophisticated but shorter proofs in chapter 14. 


8.7. The Jacobi symbol 


The Jacobi symbol is defined as follows: If m is a positive odd integer, we write 
m= IL p®, where the p are distinct odd primes, and then 


This is defined only for odd m, not for even m. 


If a is a square modulo m, then, by the Chinese Remainder Theorem, a is a 
square modulo every prime p dividing m; that is, (a/p) = 0 or 1 for all p|m and so 
(a/m) = 0 or 1. However the converse is not always true; for example, 2 is not a 
square mod 15 as 


2 2 . . . 2 2 2 
2)=()--t neste (2) (2)()- 


Exercise 8.7.1. Suppose that m is an odd positive integer. 


(a) Prove that (+) = (2) whenever a = b (mod m). 


m 
(b) Prove that (2) = (+) (4). 
(c) Prove that if (2) =—1, then a is not a square mod m. 
(d) Prove that (+) =0 if and only if (a,m) > 1. 


Exercise 8.7.2. (a) Prove that aan () =0 for every non-square odd integer m > 2. 
= m 
(b) For how many residues a mod m do we have (a/m) = 1? 
(c) For how many residues a mod m do we have (a/m) = —1? 


Exercise 8.7.3. Show that if n > 1, then (ax) = 1; 


Theorems [8.3] [8.4] and[8.5)can all be extended to the Jacobi symbol (as we will 
prove at the end of this section): If m and n are odd, coprime integers > 1, then 


-1\ Jl ifn=1 (mod 4), 
(8.7.1) (=) ~ ty ifn=-—1 (mod 4), 


(8.7.2) (=) _}i 7 n=lor —1 (mod 8), 
—-1 ifn=3o0r —3 (mod 8), 


and the law of quadratic reciprocity 


(8.7.3) (“) (=) afaya 


308 8. Quadratic residues 


We can use these three rules to easily evaluate (m/n) for any odd coprime 
integers m and n. One begins by selecting M = m (mod n) as conveniently as 
possible, usually with |M| <n. Then we factor M = +2*¢ where @ is an odd 
positive integer <n, so that (@) = (“4) = (=) eas (4). We can evaluate the 
first two Jacobi symbols using the first two rules above (which depend only on the 
value of n (mod 8)), and then we know that (£) = +(%) by the third rule. To 
evaluate (3) we repeat this process, but now with a smaller pair of numbers, so 
that the algorithm will terminate after finitely many steps. 


This algorithm only involves dividing out powers of 2 and a possible minus sign, 
so it goes fast and avoids serious factoring; in fact it is guaranteed to go at least as 
fast as the Euclidean algorithm since it involves very similar steps|°| Here is a first 
straightforward example using the Jacobi symbol, instead of the Legendre symbol: 


106\ — /35\ Ty Ly ; 
ripe Ma ae Rp 
(Note that (71/35) is not the Legendre symbol as 35 is not prime, but it is a Jacobi 


symbol.) Now let’s revisit the example (3%) from section [8-5] and avoid factoring 


311 
247: 


Gr) 7 Gi) re i) ~ (a) a 


We did not need to factor 247, and each step of the algorithm was straightforward. 


Exercise 8.7.4. Determine (a) (33); (b) (333); (c) (333); (d) (54%). 


Proof of (8.7.1), (8.7.2), and (8.7.3). We proceed by induction on the number of 
prime factors of m and n. The results follows when m and n have one prime factor 
by Theorems and respectively. Otherwise we write n = ap for some 
prime p dividing n (swapping the roles of m and n if necessary). 


Exercise 8.7.5. Prove that 2¢1 + 2=1 = 22=1 (mod 2) for any odd integers a, b. 
2 2 2 y 


Equation (8.7.1) can be rephrased as (=+) = (—1)*=. By induction, using the 
multiplicativity of the denominator of the Jacobi symbol, 


) =) (G) G) rn = cr son 


by exercise [8.7.5] 


Similarly by induction and multiplicativity of the numerator and denominator, 


@O)-@@)-O8) @@-O8-@@ 


= (-1) 3 3 3 2 =(-1) 2 2 =(-1)72 a. 
by exercise [8.7.5] 


5 As in the “speeded up” version of the Euclidean algorithm, given in section of appendix 1B. 


8.8. The squares modulo m 309 


If (2) = (4), then a = +p (mod 8), so that n = ap = +1 (mod 8), and 
therefore (2) = (2)(2) Seley ins) = = (Fs then a = +3p (mod 8), so 
that n = ap = +3 (mod 8), and therefore (2) = ()G) = (1)(-1) = -1. 


Gauss gave a different proof of (8.7.2), tying the question directly into finding 
solutions to quadratic equations. This foreshadows Gauss’s proof of the full law of 
quadratic reciprocity, which we will give in appendix 8C. 


Gauss’s induction step for integers n = +3 (mod 8). We suppose that (8.7.2) 
is true for all odd integers m < n and that n = +3 (mod 8). If n = ab is composite 
with 1 <a,b<n, then (2) = (2) (?) and the result for n follows by applying the 


induction hypothesis with m = a and with m = b. 


Therefore we may suppose that n = p is prime and assume that () =1. Leta 
be the smallest odd positive integer for which a? = 2 (mod p) so that 1<a<p-—1 
(for if b is the smallest positive integer for which b? = 2 (mod p), then let a = b if b 
is odd, and a = p—b if b is even), and write a? —2 = pr. Evidently pr = a?-2 = —1 
(mod 8) and so r = p?r = p(pr) p = +3 (mod 8). Now a? = 2 (mod r) and so 


(2) =1 with r= a < pand r = +3 (mod 8). This contradicts the induction 


hypothesis, and so our assumption is wrong. Therefore CG) =-1. 


Exercise 8.7.6. Prove an analogous induction step for integers n = 5 or 7 (mod 8) when estab- 
lishing the value of (=). 


Exercise 8.7.7 (A useful reformulation of the law of quadratic reciprocity). For a given odd, 
squarefree integer n > 1 let n* = (=) n. Prove that n* = 1 (mod 4) and that we have (2) — 


(+) for all odd integers m > 1. 


8.8. The squares modulo m 


To determine the squares mod m, that is, the residues a (mod m) for which there 
exists b (mod m) with b? = a (mod m), we may use the Chinese Remainder The- 
orem: We know that a is a square mod m if and only if a is a square modulo every 
prime power factor of m. So it is sufficient to understand the squares modulo every 
prime power. 

Above we have understood the squares modulo every prime p. We now “lift” 
these squares to determine the squares modulo every prime power, p”. Let’s begin 
by studying the squares mod p?: 

The squares mod 9 are 0, 1, 4, and 7 mod 9 (these are the least residues of 
07,17,...,8? (mod 9), excluding repetitions). The non-zero residues, 1, 4, and 7 
are all = 1 (mod 3); in fact they are all of the residue classes a (mod 9) for which 
a=1 (mod 3). We have seen that 1 (mod 3) is the only quadratic residue mod 3. 


Similarly mod 25 we have the squares 


0,1,4,9, 16, 11,24, 14,6,21,and 19 (mod 25). 


310 8. Quadratic residues 


The non-zero squares here are 1, 6, 11, 16, and 21 (mod 25), the residue classes a 
(mod 25) for which a = 1 (mod 5), and 4, 9, 14, 19, and 24 (mod 25), the residue 
classes a (mod 25) for which a = 4 (mod 5). Moreover 1 and 4 (mod 5) are the 
quadratic residues mod 5. 


A pattern begins to emerge. Define a to be a quadratic residue (mod m) if 
(a,m) = 1 and there exists b (mod m) for which b? = a (mod m). 


Proposition 8.8.1. Let p be a prime. If r is a quadratic residue mod p*, then r 
is a quadratic residue mod p*t! whenever k > 1, except perhaps when p* = 2 or 4. 


Proof. There exists an integer x for which x? = r (mod p*), and (x,p) = 1 as 
(r,p) =1. We let n be that integer for which 2? = r+ np*. 


Now if p is odd, then, for any integer j7, we have 


(a — jp’)? = 2? — jap’ + 7?p’* =r+(n—2jx)p* (mod p 


aaa o 

This is = r (mod p**") if and only if 2jz = n (mod p), which holds if and only 

if 7 = n/2x (mod p) (as (2x,p) = 1). Therefore r is a square mod p**1, and our 

proof yields that there is a unique X (mod p**!) for which X = x (mod p*) and 

X*=r (mod p**"), namely X = a2 — jp* (mod p*t!) where j = n/2zx (mod p). 
If p = 2, then 2? = r+n-2* and z is odd so that x? — nz2* =r (mod 2*+4), 

Therefore 


(a —n2"-1)? = 2? —nx2* +n227*-2 =r (mod 2+), 


provided the exponent 2k — 2 >k+1; that is, k > 3. 


Exercise 8.8.1. Deduce that an integer r is a quadratic residue mod p* if and only if r is a 
quadratic residue mod p, when p is odd, and if and only if r= 1 (mod gcd(2",8)) when p = 2. 


This implies that exactly half of the reduced residue classes mod p* are qua- 
dratic residues, when p is odd, and exactly one quarter when p = 2 and k > 3. 
Using the Chinese Remainder Theorem we therefore deduce from exercise [8.8.1] 
the following: 
Corollary 8.8.1. Suppose that (a,m) =1. Then a is a square mod m if and only 
if (3) = 1 for every odd prime p dividing m, anda =1 (mod gced(m,8)). 
Exercise 8.8.2. Suppose that (a,n) = 1 and that b? =a (mod n). Prove that the set of solutions 


x (mod n) to 2? = a (mod n) is given by the values br (mod n) as r runs through the solutions 
to r? = 1 (mod n). (Determining the square roots of 1 (mod n) is discussed in section B.8]) 


Additional exercises 


Exercise 8.9.1. Let p be an odd prime where p{a. Show that the congruence ax? + ba +c=0 
(mod p) has a solution 2 (mod p) if and only if b? — 4ac is a square mod p. 


Exercise 8.9.2.1 Prove that m? and m? +1 are both squares mod p, for m equal to at least one 
of a, a+1, or a2 +a+1, for any integer a. (This generalizes exercise a).) 


Exercise 8.9.3. The polynomial «+ — 4x? + 1 is irreducible over Q[z] by TheoremB.4] 
(a) Prove that #*—4x?+1 can be factored mod p as (x? — a) (x? — 8) or (a? —ax+1)(a? +ax+1) 
or (2? — az +1)(x? + az +1) if 3 or 6 or 2 is a square mod p, respectively. 


Questions on squares mod m, and the Legendre symbol 311 


(b) Deduce that «+ — 4%? + 1 (mod p) is reducible for every prime p. 
(c)t Prove that every quadratic polynomial of the form x*+ ax? + b? factors into two quadratics 
mod p, for every prime p. 


Exercise 8.9.4. Prove that if p= 1 (mod 4), then x* + 4 factors into four linear factors mod p. 


Exercise 8.9.5. Let f(.) be the totally multiplicative function for which f(3) = 1 and f(p) = (4) 
ifpA3. 
(a) Give a formula for f(n) for an arbitrary integer n. 
(b)? For any given large constant B, suppose that p is a prime for which (q/p) = f(q) for every 
prime q < B. Show that there are no three consecutive squares mod p that are all < B. 


This shows that the result in exercise b) cannot be extended to three consecutive integers 
provided the hypothesis in (b) holds. This hypothesis will be justified in exercise[8.17.2]of appendix 
8D. 


A . = d\ _ 
Exercise 8.9.6. Show that if (2) = —1, then Dain (4) =0. 
Exercise 8.9.7. Suppose that a and b are integers and {, : n > 0} is the second-order linear 
recurrence sequence given by (0.1.2) with zo = 0 and x2; = 1. Using exercise [0.4.10{b) prove that 


if odd prime p divides some zy with n odd, then (—b/p) = 1. Deduce that if (—b/p) = —1 and p 
divides zn, then n is even. 


Exercise 8.9.8. (a) Suppose that p* is an odd prime power. Prove that there are 1 + (s) 
residue classes b (mod p*) for which b? = a (mod p*) . 
(b) Suppose that n is an odd positive integer. Prove that there are II, Srineesolh @ + (s)) 


residue classes b (mod n) for which b? = a (mod n). 


(c) Show that this equals Dan ($) where the sum is restricted to squarefree integers d. 


Exercise 8.9.9.' Let p be a given odd prime. 
(a) Prove that for every m (mod p) there exist a and b mod p such that a? + 6? = m (mod p). 
(b) Deduce that there are three squares, not all divisible by p, whose sum is divisible by p. 
(c) Generalize this argument to show that if a, b, and c are not divisible by p, then there are 
at least p solutions 2, y,z (mod p) to ax? + by? + cz? =0 (mod p). 


Exercise 8.9.10.1 Let m be a squarefree integer 4 1, and let a be an odd positive integer. 
(a) Prove that the Jacobi symbol (4) is a periodic function of a of period dividing 4m. 
(b) Show that the Jacobi symbol (#2) has minimal period 12. 


(c) Prove that if m is odd and (a,2m) = 1, then (435) = (=) (#*). 


Now suppose that m = 3 (mod 4). 
(d) Prove that there exists an integer r for which (4%) = —1. 


(e) Prove that 47, (4%) =0. 


Exercise 8.9.11. (This extends exercise[8.2.4]) 
(a) Let n = pq where p and q are distinct primes = 3 (mod 4), and m = 4(2>+ : 1 +1). 
Show that if (s) = (2) = 1 and b=a™ (mod n), then b? =a (mod n). 
(b) Any odd prime p can be written uniquely in the form p = 1 + 2*m where m is odd and 


m+1 
k > 1. Prove that if a is a 2*th power mod p and b=a > (mod p), then b? = a (mod p). 


If prime p = 1 (mod 4) and (a/p) = 1 but a is not a fourth power mod p, then we do not know how 
to use this idea to find a square root of a (mod p). Known methods in this case are considerably 


more complicated (see, e.g., [CP05)). 


312 8. Quadratic residues 


Exercise 8.9.12. Suppose that p is a prime = 3 (mod 4) and (4) = 1. Prove that there are 


exactly two solutions 2 (mod p) to x* = b (mod p). 


Exercise 8.9.13.' Show that if p is a prime which divides m2? — 15 for some integer m, then 
either p = 2, 3, or 5, or p = +1,+7,+11, or +17 (mod 60). 


Exercise 8.9.14.1 Show that if p is a prime = 1 (mod 4), then —1 is a fourth power (mod p) if 
and only if 2 is a square mod p. 


Exercise 8.9.15.1 If (a,n) = 1, then multiplication by a (mod n) generates a permutation 
of the reduced residues mod n. For example for 3 (mod 7) we get the permutation 03,7 := 
(1, 3,2, -1, —3, —2), whereas for 2 (mod 7) we get the permutation 02,7 := (1,2, 4)(3,6,5). Prove 
that if p is prime and (a, p) = 1, then the signaturd®] of the permutation 


e(ea») = (2). 


Exercise 8.9.16. (a) Prove that (354) = 0 if (m,n) > 1. 


(b) Suppose that n = mq+r where n >m>r > 2. Prove that (374) =-— (33). 
u o 


(c)? Prove that if n/m = [ao,a1,..., ax] with (n,m) = 1 and ag > 2, the (+54) =(-1)"41, 


Infinitely many primes. 


Exercise 8.9.17.' Fix odd, squarefree integer n > 1. Prove that there are infinitely many primes 
p for which (p/n) = —1. 


Exercise 8.9.18.' Let n be a squarefree integer. 
a) By considering the prime divisors of m? — n, for well-chosen values of m, prove that there 
are infinitely many primes p for which (n/p) = 1. 

) Deduce that there are infinitely many primes = 1 (mod 3). 
) Refine this to deduce that there are infinitely many primes = 7 (mod 12). 
) Prove that there are infinitely many primes = 11 (mod 12). 

(e) Prove that there are infinitely many primes = 5 (mod 8). 
) Prove that there are infinitely many primes = 7 (mod 8). 
) (mod 8). 
) (mod 12). 


=7 
Prove that there are infinitely many primes = 3 
Prove that there are infinitely many primes = 5 


Exercise 8.9.19.' Fix odd, squarefree integer n > 1. Using exercises [8.9.18[a) and [8.7.7] prove 
that there are infinitely many primes p for which (p/n) = 1. 


In Ram Murty’s undergraduate thesis (1976, Carleton University, Ottawa) he 
defined a Euclidean proof that there are infinitely many primes = a (mod q) to be 
one in which we use a polynomial all of whose prime divisors either divide q or are 
= 1 ora (mod gq). Several of the proofs for the different arithmetic progressions in 
the last three questions can be formulated in this way. We gave such a proof for 
a =1 in Theorem[7.8] Murty went on to show that there is a Euclidean proof that 
there are infinitely many primes = a (mod q) if and only if a? = 1 (mod q) (as in 
all our examples here). To prove that there are infinitely many primes = 2 or = 3 
(mod 5), or 5 (mod 7), etc., we will have to develop other techniques. 


6 Any permutation can be described by a sequence of transpositions (swaps) of pairs of elements. 
Although the sequence, and even the number of swaps in such a sequence is not unique, the parity of 
the number of swaps is. This is called the signature of the permutation and is given by —1 or 1 (for an 
odd or even number of transpositions, respectively). 


Questions on primitive roots and quadratic reciprocity 313 


Further reading on Euclidean proofs 


[1] M. Ram Murty and N. Thain, Primes in certain arithmetic progressions, Funct. Approx. Comment. 
Math. 35 (2006), 249-259. 


Primitive roots for specially chosen primes. 


Exercise 8.9.20.1 Suppose that q and p = 2q+ 1 are odd (Sophie Germain twin) primes. 
(a) Show that if p = 3 (mod 8), then 2 is a primitive root mod p (e.g., 11, 59, 83, 107,...). 
(b) Show that if p= 7 (mod 8), then —2 is a primitive root mod p. 
(c) Prove that —3 is a primitive root mod p, but 3 is not. 


Exercise 8.9.21.1 Suppose that q and p = 4q + 1 are odd primes. Prove that 2, —2, 3, and —3 
are all primitive roots mod p. 


Exercise 8.9.22.1 Suppose that the Fermat number Fy, = 22 +1 is prime with m > 1. Prove 
that if (q¢/Fim) = —1, then q is a primitive root mod Fm. (We deduce that 3 and 5 (for m > 1) 
are primitive roots mod Fm by exercise[8.5.4]) 


Alternate proofs of the value of (2/n). 


Exercise 8.9.23. Let p be a prime = 1 (mod 4) so that there exists a reduced residue r (mod p) 
such that r? = —1 (mod p). 
(a) By expanding (r + 1)? (mod p) prove that 2 is a square mod p if and only if r is a square 
mod p. 
(b) Prove that r is a square mod p if and only if there is an element of order 8 mod p. 
(c) Use Theorem [7.6]to deduce that 2 is a square mod p if and only if p=1 (mod 8). 


Exercise 8.9.24 (Proof of (8.7.2)). By induction on odd n > 1. By the law of quadratic reci- 
procity, as stated in (8.7.3), we have 


ee Aa Ge. ae 


as one of n and n — 2 is = 1 (mod 4). Complete the proof. 


Exercise 8.9.25. Every odd prime p may be written in the form p = 4k + 0 with 0 = ($4). 


We will show that (2) = (—1)* which implies TheoremB.4] Let m = 2k+o so that 2m = p+o. 
Verify that 


Gee gee ware acy 


and deduce the result from here. 


Further proofs of the law of quadratic reciprocity. 


Exercise 8.9.26.' (a) In the mid-18th century, Euler conjectured that if m > n are coprime, 
odd, positive integers, then (2) = (2) where m—n = 4a ifm =n (mod 4), and m+n = 4a 
otherwise. Use the law of quadratic reciprocity to prove Euler’s conjecture. 


(b) Use Euler’s conjecture to prove (8.7.3), the law of quadratic reciprocity. 


Scholze (1938) proved Euler’s conjecture using Gauss’s Lemma (Theorem[8.6) and so gave a 
different proof of the law of quadratic reciprocity. 


Exercise 8.9.27.' Finally we present my own variation of Rousseau’s proof of quadratic re- 
cipocity, as a series of (challenging) exercises. Let p < q be odd primes, and let n = pg. 
Let A = Thi<men/2 (m,n)=1™: In the proof given of Theorem in section [8.6] we showed 


that A= (3) (2) (mod p) and, analogously, A = ($3) (2) (mod q). We now evaluate A 


(mod n) much as in Gauss’s proof of Wilson’s Theorem, where we paired up each residue with its 
inverse: Let S be the set of (unordered) pairs {a,b} € [1, }) for which ab = 1 or —1 (mod n). 


314 


8. Quadratic residues 


(f) 


Prove that the residues a and b are distinct unless a? = 1 or —1 (mod n). 


Prove that if a2 = 1 (mod n), then a = 1, —1, r, and —r (mod n) for some r # +1 
(mod n). 
Prove that the product of the integers a € [1, 3) with a? = 1 (mod n) is = +r (mod n). 


Prove that if b? = —1 (mod n), then p= q=1 (mod 4). In this case: 
e Deduce that the product of the integers b € [1, 5) for which b? = —1 (mod n) is = +r 
(mod n). 
e Deduce that A= +1 (mod n). 


e Combine the above to show that (=) (2) = (++) (2). 


If at least one of p and q is = 3 (mod 4): 
e Deduce that A=+r (mod n). 


e Combine the above to show that (+) (2) =. (3) (2). 
Deduce Theorem 


Appendix 8A. Eisenstein’s 
proof of quadratic reciprocity 


8.10. Eisenstein’s elegant proof, 1844 


A lemma of Gauss gives a complicated but useful formula to determine (a/p): 


Theorem 8.6 (Gauss’s Lemma). Given an integer a which is not divisible by odd 
prime p, define r,, to be the absolutely least residue of an (mod p), and then define 


the set N :={1<n< ®*: ry <0}. Then (s) = (-1)M1, 


For example, if a = 3 and p= 7, then r, = 3,r2 = —1,r3 = 2 so that N = {2} 
and therefore (2) = (—1)' = -1. 


7 
Proof. For each m,1 << m < po, there is exactly one integer n, 1<n< po 
such that r, =m or —m (mod p) (for if an = tan’ (mod p), then pla(n=n’), and 
so p|n + n’, which is possible in this range only if n = n’). Therefore 
p-l1 
(r= TE om= Tome TD Ow 
1<m< 23+ 1<n<25+ 1<n<25+ 
negN nEeN 
pol p-l 
= . —an)= a? (1). ( —_— jt d p). 
TL tem): TE any =a") (254) (oa o) 
1<n<2y* I<ns2 
nen nEeN 


Cancelling out the (25)! from both sides, the result follows from Euler’s criterion. 


This proof is a clever generalization of the proof of Theorem [8.4] 


315 


316 Appendix 8A. Eisenstein’s proof of quadratic reciprocity 


Exercise 8.10.1.' Use Gauss’s Lemma to determine the values of (a) (—1/p) and of (b) (3/p), 
for all primes p > 3. 


Exercise 8.10.2. Let r be the absolutely least residue of N (mod p). Prove that the least 
non-negative residue of N (mod p) is given by 


N *] = r ifr > 0, 
me ~ )ptr ifr <0. 


Corollary 8.10.1. [fp is a prime > 2 anda is an odd integer not divisible by p, 
then 


(8.10.1) (<) = ee [+] . 


Proof. (Gauss) By exercise [8.10.2] we have 


(8.10.2) (eno “| = ie a )= Sra ta 


In the proof of Gauss’s Lemma we saw that for each m,1 < m < = = , there is 


exactly one integer n, 1 <n < a such that r, = m or —m, and so rn = m 


(mod 2). Therefore, as a and p are odd, (8.10.2) implies that 


p-1 p=i poi p=i 
IN| = 2 A (mod 2) as yrs = yin = oon (mod 2). 


We now deduce (8.10.1) from Gauss’s Lemma. 


pol 
The exponent >>,,2, [| on the right-hand side of (8.10.1) looks excessively 
complicated. However it arises in a different context that is easier to work with: 


Lemma 8.10.1. Suppose that a and b are odd, coprime positive integers. There 
are 


b-1 
2 
X11 
b 
n=1 
lattice points (n,m) € Z? for which bm < an with 0 <n < b/2. 


Proof. We seek the ce of lattice points (n,m) inside the triangle bounded 


by the lines y = 0, & = BF and by = az. For such a lattice point, n can be any 


8.10. Eisenstein’s elegant proof, 1844 317 


integer in the range 1 <n< bt For a given value of n, the triangle contains the 
lattice points (n,m) where m is any integer in the range 0 < m < %*. These are 
the lattice points in the shaded rectangle in Figure 8.1. 


D » b/2 


Figure 8.1. The shaded rectangle covers the lattice points (n,m) with 1 < 
m < [22]. 


Evidently m ranges from 1 to [$*], and so there are [$*] such lattice points. Sum- 


ming this up over the possible values of n gives the lemma. 


Corollary 8.10.2. Ifa and b are odd coprime positive integers, then 


ul a-1 


SCE : i (a—1)(b—1) 


7 2 
Proof. The idea is to split the triangle 


R= {(ey): 0<e<pando<y< 5h 


into two parts: the points in R on or below the line by = az, that is, in the region 


A:={(xa,y): 0<a<b/2and0<y<aza/b}; 


318 Appendix 8A. Eisenstein’s proof of quadratic reciprocity 


a/2 


Figure 8.2. Splitting the rectangle R into two parts. 


and the points in R above the line by = az, that is, in the region 
B:={(z,y): 0<a<by/a and0<y<a/2}. 


We count the lattice points (that is, the points with integer coordinates) in R 
and then in A and B together. To begin with 


b-1 —1 
RN? = { (nym) eZ: l<n< 5 ond tems 95}, 


2) _ a-1 , b-1 
so that |RN Z*| = 4-2. 


Since there are no lattice points in R on the line by = az, as (a,b) = 1, therefore 
ANZ? = {(n,m) €Z? : 0<n<b/2 and bm < an}, 


b-1 
and so |AN Z?| = >>,2, [44] by LemmaB-10-1) Similarly 
BOZ = {(n,m) €Z? : 0<m<a/2 and an < bm}, 


a-—1 
and so |BN Z?| = >>,2, [™] by Lemma 810-1] (with the roles of a and b inter- 


changed). The result then follows from the observation that AN Z? and BN Z? 
partition RN Z?. 


Eisenstein’s proof of the law of quadratic reciprocity. By Corollary [8.10.1] 
with a = q, and then with the roles of p and gq reversed, and then by Corollary 
8.10.2} we deduce the desired law of quadratic reciprocity: 

pol g—l 


Pp qd 


Appendix 8B. Small quadratic 
non-residues 


Given a prime p we wish to find as small an integer q as possible which is a quadratic 
non-residue mod p. This has become a central issue in several important questions 
in number theory. We discuss it at some length in this appendix and present some 
sophisticated techniques for finding small q. 


The number 1 is always a quadratic residue mod p, as are 4,9,16,.... If 2 and 3 
are quadratic non-residues, then 2:3 = 6 is a quadratic residue, by Theorem[8.1/iii). 
Hence at least one of 2, 3, and 6 is a quadratic residue, and this kind of reasoning 
implies that one is always guaranteed lots of small quadratic residues. How about 
small quadratic non-residues mod p? Since half the residues are quadratic non- 
residues one might expect to find lots of them, but a priori one is only guaranteed 
to find one that is < ae Can one do better? This is an important question in 
number theory and one where the best results known are surprisingly weak. 


Exercise 8.11.1. Prove that the smallest quadratic non-residue mod p must be a prime. 


We can therefore restrict our attention to determining the smallest prime gq for 
which (q/p) = —1. The analogous question for quadratic residues is of interest 
(remember that finding the smallest integer that is a quadratic residue is of limited 
interest since we always have 1,4,9,..., and at least one of 2, 3, and 6). However this 
turns out to be more difficult than one might guess as it ties in with deep algebraic 
issues (which are discussed in section [12.5). To give one surprising example, one 
finds that every one of the dozen primes q < 37 is a quadratic non-residue mod 
163. Therefore 41 is the smallest prime which is a quadratic residue mod 163. 


8.11. The least quadratic non-residue modulo p 
Theorem 8.7. For every odd prime p there exists a prime q < \/p +1 for which 


oat 


319 


320 Appendix 8B. Small quadratic non-residues 


Proof. Let ¢ be the least quadratic non-residue (mod p). By exercise [8.11.1] we 
know that q is prime. Let m be the least integer > p/q so that p/q<m< p/q4+1 
and therefore 0 < a := mq —p < q. Therefore a is a quadratic residue (mod p) 


and so (2) = (2) (:) = 1-(-1) = —1. This implies that m is a quadratic 
non-residue and so gq < m < p/q+1 which implies that q? < p+ q, and therefore 


(q—1)? <q —q <p, and the result follows by taking square roots. 


This is better than the similar result proved in exercise [9.7.3(b), though by 
very different methods. 


8.12. The smallest prime g for which p is a quadratic non-residue 
modulo qg 


In Gauss’s original proof of quadratic reciprocity he needed to show that each 
prime p = 1 (mod 4) is a quadratic non-residue modulo some prime gq < p (as part 
of an induction argument). This result follows from Theorem [8.7] by quadratic 
reciprocity, but we need to avoid using quadratic reciprocity in a proof of quadratic 
reciprocity! We now present Gauss’s most ingenious argument. 


Theorem 8.8. For every prime p= 1 (mod 8) there exists a prime q < 2,/p+1 
such that (2) = <1, 


Proof. (Gauss) Let m = [,/p] and consider the product (p—1?)(p—2?) ---(p—m?), 


under the assumption that a = 1 for all odd primes g < 2m+1. Now since p = 1 


(mod 8) and (2) = 1 for all odd primes g < 2m +1, we deduce that for any given 
prime power q” with gq < 2m-+1, there exists a residue a, (mod gq”) such that 
p= a (mod q"), by exercise[8.8.1] Since this is true for each g < 2m+1 and since 
(2m + 1)! is divisible only by powers of primes gq < 2m +1, we use the Chinese 
Remainder Theorem to construct an integer A for which p= A? (mod (2m +1)!). 
Now (p, (2m+1)!) = 1 and so (A, (2m+1)!) = 1, which implies that A is invertible 
mod (2m +4 1)!. Therefore 


(p — 17)(p — 2?) -.- (p— m?) = (A? — 1°) (A? — 2?) .. (A? — m?) 
(A+m)! 1 


“Como A (mod (2m + 1)!). 


A+m)\ : : 
Now Cw is an integer, and so 


(A+m)! 1.2 {(A+m 
(A—m—1)! A A BTA ica 
Therefore (2m +1)! divides (p— 17)(p— 27) ---(p—m). However p < (m+1)? and 
so 


Il 


0 (mod (2m +1)!). 


(2m + 1)! < (p— 1?)(p— 2°)---(p — m?) 
(2m + 1)! 


< ((m+1)? — 12)((m +1)? — 2?)---((m +1)? — m?) = a 


’ 


giving a contradiction. 


8.13. Character sums and the least quadratic non-residue 321 


The following proof is also due to Gauss. 
Theorem 8.9. For every prime p = 5 (mod 8) there exists a prime q < ./8p such 
that (2) — 


Proof. (Gauss) Suppose that p = 5 (mod 8). Let a be the largest integer < ,/p/2, 
so that a > (p/2)'/2 — 1. Now p— 2a? = 3 or 5 (mod 8) and so has a prime divisor 
q = 3 or 5 (mod 8) (by exercise B.1.4{b)). But then, by Theorem [8.4] we have 


(2) =-—1 and so (4) = (22) =-—1. Finally 
g<p—2a® <p —2(./p/2 — 1)? = 2(,/2p — 1) < /8p. 


It is hard to resist giving another proof of this type. 


Theorem 8.10. [fp =1 (mod 4), p > 17, there exists a prime q < 4,/p +4 with 
q =3 (mod 4) and (=) =-1. 


Proof. Let 2a be that even integer immediately greater than ,/p, so that 4a?—p = 3 
(mod 4). Let q be a prime divisor of 4a? — p which is = 3 (mod 4) so that p = 4a? 
(mod q) and hence (2) = 1. But then (=) = —l as q = 3 (mod 4). Also 


2a < ,/p +2 and so q < 4a” —p < (,/p + 2)? —-p=4(/p+ 1). 


8.13. Character sums and the least quadratic non-residue 


If there are N quadratic residues and R quadratic non-residues mod p up to z, 
where 1 <2 <p, then N+ R=<2z and N — R is the value of the character sum 


son E(2) 


n<ux 
As each |(+)| < 1, we trivially have |S(x)| < x and would like to obtain improve- 
ments on this upper bound when possible. For if S(#) = 0, then N = R = $0; 
and if S(x) is small compared to x, then roughly half the residues are quadratic 
residues, and roughly half are quadratic non-residues; more precisely 


N=$(e+S(x)) and R=}$(r—-S(z)). 


Now S(p) = S(p — 1) = 0 as there are po quadratic residues mod p, and the 


same number of non-residues. The sum, S(p), sums the value of (2) as n runs 
through a complete set of residues mod p and so is a complete character sum. In 
contrast, any sum S(a) with x < p is known as an incomplete character sum. 

If (—1/p) = 1, then (n/p) = ((p—n)/p) and so ) 1, <p/2(n/P) = dip/ren<p(n/P). 
This implies S(25+) = 4.5(p— 1) = 0. On the other hand, if (—1/p) = —1, then 
S(2) = ce i po = 1 (mod 2) and so is non-zero. Its value relates to 
questions about quadratic forms we will discuss in section 12.15] of appendix 12D. 


Exercise 8.13.1. (a) Prove that S(a) is periodic with period p. 
(b) Prove that |S(a)| < for all integers x. 


322 Appendix 8B. Small quadratic non-residues 


The non-zero values of - are 1 and —1. If we imagine that the sums of (2) 
are distributed much like sums of independent random variables taking values 1 
and —1 with equal probability, then we might expect that the maximum value of 
|S(x)| is about x logp; and thus |.$(x)| should be no more than \/plogp as we 
vary over x. This is confirmed by the Pélya-Vinogradov Theorem (1919) which 
states for any integers M and N > 1 one has 
M+N 
» 4) 

Pp 


n=M+1 


< Vplog p. 


This improves the trivial upper bound < JN, if N is large enough. However if 
N < \/p, then the Pélya-Vinogradov bound is worse than the trivial bound, and it 
would be useful to obtain an improvement. At the very least we might hope that 


for any given E> 0, we have 
Pp 


n=1 
once z is sufficiently large (as a function of p and €). We believe that this is true if, 
for instance, x = p® for any fixed 6 > 0. However the best result known is due to 
Burgess (1962), who showed that this holds when x = p!/4+®, One can deduce from 
this that the least quadratic non-residue mod prime p is < p!/4, and by a more 
involved argument is < p!/4V© once p is sufficiently large. Burgess’s result has 
not been significantly improved in a long time and falls far short of Vinogradov’s 
conjecture that the least quadratic non-residue is < p* for all sufficiently large p. 
These questions will be discussed in detail in [GS]. 


What is the true size of the least quadratic non-residue mod p? It is believed 
that there is a quadratic non-residue mod p that is < 2(log p)?, but we cannot prove 
this (though we can also deduce this from the Generalized Riemann Hypothesis). 
Brave souls have even argued that the least quadratic non-residue might always be 
< clog plog log p for a suitable constant c > 0. One can show that there are primes 
p for which the least quadratic non-residue is at least this large. 


<x, 


To find a quadratic non-residue we might try out 2,3,4,... until we find one; 
we can test whether each given integer a is a quadratic non-residue by calculating 
the Legendre symbol (a/p) (using the algorithm described in section[8.5). We very 
much expect to find one with a < 2(log p)?, which should be a quick calculation. If 
we fail, then there is an extraordinary bonus: It would imply that one of the most 
famous conjectures in mathematics, the Generalized Riemann Hypothesis, is false! 


Since half the reduced residues are quadratic residues and half are quadratic 
non-residues mod p, if we pick a reduced residue mod p at random, then the prob- 
ability that it is a quadratic non-residue is s. If it was a quadratic residue, then 
we can try again, and if we get another quadratic residue, then try again, etc. In 
other words, pick integers a1, a2,...,a,%,... from {1, 2,3,...,p—1} at random until 
we find a quadratic non-residue mod p. The probability that none of a1, a2,..., ax 
are quadratic non-residues mod p is 1/2". With k = 100 it is inconceivable that 
we could fail! This test is expected (but not certain) to run in polynomial time 
(see section [7.14] of appendix 7A) and so is said to be a random polynomial time 


algorithm. 


Appendix 8C. The first proof 
of quadratic reciprocity 


8.14. Gauss’s original proof of the law of quadratic reciprocity 


Gauss gave four proofs of the law of quadratic reciprocity, and there are now literally 
hundreds of proofs. None of the proofs are easy. For an elementary textbook like 
this, one wishes to avoid any deeper ideas, which considerably cuts down the number 
of choices. The one that has been long preferred stems from an idea of Eisenstein 
and is discussed in section [8.10] It ends up with an elegant lattice point counting 
argument though the intermediate steps are difficult to follow and motivate. Gauss’s 
very first proof was long and complicated yet elementary and the motivation is quite 
clear. Subsequent authorg’] have shortened Gauss’s proof and we present a version 
of that proof here. 


We define (7) = (™) and prove (8.7.3) extended to all odd integers, negative 
as well as positive. If (8.7.3) holds for (m,n) with given values of |m| and |n|, then 
(8.7.3) holds for (+m, =n) since 


(2VB)= GOO -cr cower coe 


We assume that |m| > |n| > 1 and prove the result by induction on |m|, followed 
by induction on |n|. If m = ab is composite with 1 < a,b < m, then 


a) a) Ge (7) (2) es = (-1)°R "F(T = (1), 


by exercise The analogous proof works if n is composite. We are therefore 
left with the case that |n| = p < |m| = q are odd primes. The proof is modeled on 


“A preprint may be found at http://www.math.cornell.edu/~web401/steve.gauss17gon.pdf, entitled 
The mathematics of Gauss by David Savitt. 


323 


324 Appendix 8C. The first proof of quadratic reciprocity 


Gauss’s induction step for evaluating (2) for integers n = +3 (mod 8) given at the 

end of section [8.7] There are two cases here: 
e When (7) = lor (a) = 1, let ¢ = p or —p, respectively, so that (£) =1, 
and we will prove the result with p replaced by ¢: There exists an even integer 
e, 1 < e < q—1, such that e? = ¢ (mod q), and therefore there exists an 

integer s for which 
e? = L445. 

— If p does not divide s, then (£) = 1 as e? = @ (mod s), and (4) =1as 


2 
e? = qs (mod £). Moreover |s| = ot < we 


<qas p< q, so the 
reciprocity law works for the pair @,s by the induction hypothesis. As 
() = 1, we therefore deduce 


EV agy> q\ (4 qs £\ (8s oy. e-1 8-1 
Gow a) a=W) 
by the induction hypothesis. This yields the result if = 1 (mod 4). On 
the other hand if 2 = —1 (mod 4), then gs = e? — = 1 (mod 4), that 
is, s = q (mod 4), and the result follows. 


— If p divides s, we write s = 0S, e = €E to obtain (LE? = 1+ qS, and so 


(3) = (=#8) = 1. Therefore 
OQ) =): =)? 


and the result follows since S = —q (mod 4). 


e When (7) = (=*) =—1, we have (7 = 1 so that q=1 (mod 4). Therefore 
there exists a prime r < q such that (4) = —1 by Theorems [8.8] and 8.9] If 
r =p, then the result follows, so now assume that r 4 p. Moreover we must 
have (4) = —1 or else, since we have already proved the reciprocity law when 
(5) = 1, this would imply that (7) = 1 as q= 1 (mod 4), a contradiction. 


Therefore (ae = 1 and so there exists an even integer e, 1<e<q-—l, 


such that e? = pr (mod q), which implies that there exists an integer s with 


e =pr-+qs. 


Now |s| = | ae < Se | <q, so the reciprocity law works for any 
two of r, p, and s, by the induction hypothesis. 

We proceed much as above but now there are four possibilities for d = 
(pr, qs) = (pr, s), which we handle all at once: Since d is squarefree and d|pr + 
qs = e”, hence dle. We write e = dE, pr = dL, and s = dS so that dE? = 
L+qS, and dE”, L,qS are pairwise coprime. But then 


(=) 7 (“) (=) (<) 1 

d ~B q Ss ; 

from the equations above. Multiplying these all together and reorganizing, 
and then using that pr = dL, we obtain 


(S)(4)- Gs) - GB). 


8.14. Gauss’s original proof of the law of quadratic reciprocity 325 


After some rearrangement this becomes 


OO - AE) OO- A 
) = 1 by the choice of r. We use the induction hypothesis for the 
,(p, S), and (—L,d) (even when one of L and d is +1) to obtain 


(£) (2) _ l . (me ae oe ee SS 


Now S = —qL = —L (mod 4) as 2 divides FE and q = 1 (mod 4), and so the 
exponent here (the power of (—1)) is 


£L+1fd-1 —1 —1 L+1 dpr-1 
= 2 ( >t j= * (mod 2), 


since (4)( 
pairs (r,S 


S018 


2 2 2 2 ~ 2 2 


by exercise [8.7.5] Now dpr = d?L = L (mod 4), so the above exponent is 


= +41. 4=1 =0 (mod 2), and the result follows as g = 1 (mod 4). 


Appendix 8D. Dirichlet 
characters and primes 
in arithmetic progressions 


In this appendix we introduce Dirichlet characters, an important and natural gen- 
eralization of the Legendre and Jacobi symbols. Dirichlet characters are used in 
many parts of number theory, and we shall highlight this by sketching Dirichlet’s 
proof that there are infinitely many primes in any arithmetic progression a (mod q) 
with (a,q) = 1. First we begin by reviewing some of what we already know about 
Legendre and Jacobi symbols, from a slightly different perspective. 


8.15. The Legendre symbol and a certain quotient group 


Let p be an odd prime and G, := (Z/pZ)* be the multiplicative group of reduced 
residues mod p. The subgroup of quadratic residues, 


Hy = G2 = (gig Cy) = {a (mod »): (2) =a}, 


is of size (p—1)/2. This subgroup partitions G, into two cosets, H, and bH,, where 
(b/p) = —-1. If a € Hy, then (a/p) = 1, and if a € bH,, then (a/p) = —1. Hence 
the Legendre symbol distinguishes between the two equivalence classes in G,/Hp. 
Now G,/H, is isomorphic to Z/2Z, which can be written in the multiplicative form 
with representatives —1 and 1, that is, the values given by the Legendre symbol. 
So one can view the Legendre symbol as the quotient map G, > G',/H, given by 


a-> (z). Such quotients are called characters. 

If n = pyp2:+:p, where the p; are distinct odd primes, then, by the Chinese 
Remainder Theorem, the multiplicative group of reduced residues mod n is given 
by 

Gy = (Z/nZ)" = (Z/p\Z)" @ (Z/peZ)" @--+@ (Z/peZ)". 


326 


8.16. Dirichlet characters 327 


This has subgroup Hy, := Hp, 6 Hy, ®-:-G Ay,, with quotient group 
Gn/An = Gp, / Hp, ® Gp, /Hp, OD Gp, / Ap, ; 
so that G,/H, & (Z/2Z)*. The quotient map G, > Gn/H» can therefore be 


Pk : 


a tmin) + ( (5) » (5): 
TAG ee Be eS} 


Now G,,/H,, has the subgroup 

which partitions G,,/H,, into two cosets, J;, and bJ;,, where b is a reduced residue 
for which (b/n) = —1 (one can find such a b, for instance where (b/p1) = —1 and 
(b/p;) = 1 for all 7 > 2, using the Chinese Remainder Theorem). If a € Jn, then 
the Jacobi symbol (a/n) = 1, and if a € bJ,, then (a/n) = —1. 

The Jacobi symbol is defined to be a homomorphism from Z/nZ — C, taking 
values inside the multiplicative closed set {—1,0,1}. However, we used it as if it 
is a map Z — C; this can be justified for if a € Z, let @ be the residue class of a 
(mod n) and then define (a/n) = (G@/n). We can view this map as the composition 
of the maps Z > Z/nZ — C, the first being reduction mod n, the second the Jacobi 
symbol. In the next section we see how Dirichlet generalized these ideas. 


Exercise 8.15.1.1 Suppose that A is a subset of Gp of size pot that is closed under multiplica- 
tion. Prove that A = Hp. 


8.16. Dirichlet characters 


The Jacobi symbol restricted to the reduced residues mod n is an example of a 
group homomorphism x : (Z/nZ)* > C*. This means that 


x(ab) = x(a)x(b) for all a,b € (Z/nZ)*. 


Exercise 8.16.1. Deduce that either x(1) = 1 or x(a) = 0 for all a € (Z/nZ)*. 


Let m be the smallest integer for which a” = 1 (mod n) for all reduced residues 
a (mod n). Then y(a)™ = y(a™) = x(1) = 1, and so the image of y is a subset 
of {z € C: z™ = 1}, the set of mth roots of unity. One can define x on all 
of Z/nZ simply by taking x(a) = 0 if (a,n) > 1, and then extend it to Z, by 
defining y(a) = x(@) where @ is the residue class of a (mod n). Such y are called 
Dirichlet characters, or simply characters. Dirichlet characters play a central role 
in advanced number theory. We say that x is a Dirichlet character of modulus n if 
x(a) = x(a +n) for all integers a; x(a) = 0 if (a,n) > 1; and x : (Z/nZ)* > C* is 
a group homomorphism. Define X(n) to be the set of Dirichlet characters mod n. 


Exercise 8.16.2.1 Given y,w € X(n), define yw by (vw)(a) = x(a)v(a) for all integers a, and 


then prove that yw € X(n). Prove that X¥ defined by x(a) = x(a) is a character. Prove that X(n) 
forms a group under multiplication. 


If n = p is a prime, let g be a primitive root mod p. For any reduced residue 
a (mod p) there exists an integer k for which a = g* (mod p), and so if y € X(p), 
then x(a) = y(g") = x(g)*; that is, the values of x(a) are all determined from the 


328 Appendix 8D. Dirichlet characters and primes in arithmetic progressions 


value of x(g). The only restriction on the value of y(g) is that y(g)?~! = x(g?~!) = 
x(1) = 1, and therefore x(g) can be any (p—1)st root of unity. There are therefore 
p—1 distinct characters mod p, one for each (p — 1)st root of unity. 


For example, if p = 2, we have just one character, and so this must be the 
character y for which x(a) = 1 whenever a is odd. In fact, for every modulus n one 
has the principal character (mod n), denoted xo, for which yo(a) = 1 whenever 
(a,n) = 1. If p = 3, then there is one other character, and so it must be the 
Legendre symbol ($). For p = 5 there are four characters, the principal character, 
the Legendre symbol (¢), and two others. Now 2 is a primitive root mod 5, and 
so the characters are determined by the value of y(2), which must be a fourth 
root of unity. For the two characters we already know about, (2) = 1 and —1, 
respectively, so the other two must have (2) = i and —i, respectively. We see that 
these two characters take on complex values. In general y takes only real values 
if and only if x(g) € R (in which case we call y a real character). But y(g) is a 
(p — 1)st root of unity and so either x(g) = 1 in which case y = yo, or x(g) = —1 


in which case x(a) = (4), the Legendre symbol mod p. 


The same argument determines X(n) whenever (Z/nZ)* has a primitive root, 
that is, when n = 1, 2, 4, p°, or 2p° for some odd prime p and e > 1, as we 
saw in Corollary [718-1] In this case x(g) is a ¢(n)th root of unity, and its value 
determines y. For example, if n = 4, there are two characters, one of which is the 
principal character, which is the same as the principal character mod 2. Since —1 
is a primitive root mod 4, the other has y(—1) = —1 and y(1) = 1, and therefore 
x(a) = (=*). By definition, characters mod p° are also characters mod p/ for all 
f =>. Since there can be just two real characters mod p°, they must therefore be 


xo and (;). 

In general if (Z/nZ)* is generated by elements gi,...,g, of orders m1,...,™r, 
then 
(8.16.1) (Z/nZ)* = {gf ---g2", 0< a; < m, —1 for each 3}; 


and so x(a) = x(g1)"* --- x(g,)°". That is, x is completely determined by the values 
of x(91),---,X(gr), where y(g;) is an arbitrary mjth root of unity. 
Exercise 8.16.3. (a) t Use the Chinese Remainder Theorem to show that ifn = II, pep, then 
X(n) = {T]p xp: xp € X(p*). 
(b) Deduce that X(n) has ¢(n) elements. 
(c) Show that if y € X(n), then x*™ = yo where (n) is Carmichael’s function (as defined in 
section [7.17] of appendix 7B). 
(d) Show that if x € X(n) is non-principal, then there exists a (mod n) such that x(a) 4 0 or 
1; 
(e) Prove that if p is an odd prime and m # 0 or 1 (mod p), then there exists x € X(p) such 
that x(m) 4 1. 
(f) Prove that if (m,n) = 1 and m # 1 (mod n), then there exists x € X(n) such that 
x(m) A 1. 


Exercise[8.16.3{a) shows that we can determine the characters mod n by deter- 
mining the characters modulo the prime power divisors of n. After our discussion 
above, it remains to understand X (2°) for e > 3. In the proof of Corollary [7.17.2] 
we saw that (Z/2°Z)* is generated by —1 and 3, which have orders 2 and 2°~?, 


8.16. Dirichlet characters 329 


respectively. Therefore there are four real characters mod 2° with e > 3, namely 
X03 (=*), and we have determined the other two, (7) in (8.7.2), and (=) when 
multiplying this with (8.7.1). 

We saw that (5) is a character mod p? as well as mod p. Given a character 
it makes sense to determine the smallest modulus for the character, in this case p, 
which we call the conductor of ae There is another way to trivially construct a 
character of larger modulus from a given character: Given a character x (mod m), 
we can multiply it by the principal characters yo (mod n) to obtain a new character 
xxo0 (mod [m,n]). We call this new character an induced character. Any character 
that cannot be induced from another is called primitive. If x is induced from a 
primitive character of modulus n, then y has conductor n. 

Since (3) = — (4), we see that (2) cannot have period 3 or even 6. In 
fact it has period 12, since (3) = (=) (5), the product of characters of moduli 4 
and 3, respectively. More generally if g is an odd, squarefree integer, then (2) has 
modulus gq if g=1 (mod 4). However, if g = 3 (mod 4), then (£) = (= (2) The 
first is a character of period 4, and so (4) has period 4q. The character mod 4g is 
then given by (—*)( ;), using exercise [8.16.3] 


Exercise 8.16.4.4 (a) Prove that if n is a squarefree odd integer, then the Jacobi symbol (<=) 
is primitive. 
(b) Prove that the real, primitive characters are given by 1, and for each squarefree, odd n: 
e The Jacobi symbol (<) of modulus n, if n > 1. 
e The symbol (2”) of modulus 8|n|. 
e The symbol (4) of modulus 4|n|, if n = 3 (mod 4). 


We will now show that if x is a non-principal character mod n, then the sum 


(8.16.2) S> x(m) = 0. 


First note that as v(m) = 0 whenever (m,n) > 1, we get the same value when 
we restrict the sum to reduced residues m (mod n), a sum which we denote by 
S. If we multiply each term in S through by x(g) where (g,n) = 1, then we get 
the sum of x(g)yx(m) = x(gm) over the reduced residues m (mod n), which is just 
a rearrangement of S (as {gm (mod n): (m,n) = 1} is a rearrangement of the 
reduced residues mod n). Therefore y(g).S = S. However, as y is a non-principal 
character, we can select g so that y(g) 4 1 by exercise 8.16.3(d), and therefore 
S=0. 


Exercise 8.16.5. Reprove (8.16.2) using the representation of the reduced residues given in 


(8.16.1). 


330 Appendix 8D. Dirichlet characters and primes in arithmetic progressions 


Finally we prove the “dual” estimate to (8.16.2) {°] For every reduced residue m 
(mod n) we have 


1 1 ifm=1 (mod n), 
(8.16.3) —~ > x(m)= 
O(n) xeX(n) 0 otherwise. 


This is immediate when m = 1. For all other reduced residues m (mod n) there 
exists a character y1 (mod n) for which y1(m) 4 1 (exercise[8.16.3). If we multiply 
our sum through by x1(m), then the y-term becomes y1(m)x(m) = (x1x)(m) and 
{xix : x € X(n)} = X(n) as X(n) forms a group under multiplication. Hence 
if the sum is S, then x1(m)S = S, which implies that S = 0 as y1(m) 4 1, as 
claimed. 


8.17. Dirichlet series and primes in arithmetic progressions 


In 1837 Dirichlet proved that for any given integer gq > 2 and any integer a with 
(a,q) = 1, there are infinitely many primes = a (mod q). We have seen how to 
prove this for specific a and q (or unions of a (mod q)) throughout this book (see, 
e.g., exercise [8.9.18), but these elementary methods seem incapable of reproving 
Dirichlet’s result. Dirichlet built an extraordinary mechanism to prove his result, 
which became the basis for much of modern number theory!)] 


In section [4.9] of appendix 4B we introduced the Riemann zeta-function and 
discussed the convergence of more general series. Dirichlet introduced the Dirichlet 
L-functions, in which the coefficient of 1/n* is now x(n), a Dirichlet character: 


x(n), 


L(s,x) := 
n>1 


Since each |y(n)| < 1, this is absolutely convergent whenever Re(s) > 1 (which can 
be proved by modifying the proof we gave for ¢(s) is section [5.12] of appendix 5B), 
and so the infinite sum is well-defined in this domain. It will require something 
more to define it for values of s on the line Re(s) = 1, or further to the left in the 
complex plane, and that something is provided by (8.16.2). The idea is that if x 
has modulus q, then we can sum q terms at a time, which will allow us to make 
sense of the definition of L(s,.) in a wider region. We will develop this in section 
of appendix 13B. We will therefore be able to show that, for each non-principal 
character x, 


S- x(n) converges to L(1,x) as @ — oo. 
n 


n<ux 


8A careful study of the proofs in this section reveals that we use little more than the fact that 
(Z/nZ)* is a multiplicative group to construct the set of characters X(n). This theory, suitably modified, 
can be elegantly extended to all groups, even non-abelian groups. These ideas lead far beyond the scope 
of this book. 

©The creativity needed to resolve a major question in number theory often yields tools that are 
so potent that they apply to a vast number of situations of which the original author-creator had no 
inkling. 


8.17. Dirichlet series and primes in arithmetic progressions 331 


One can generalize (5.12.1) to obtain 


(8.17.1) S- x(n) = Il (1x) 


Ss 
n>1 p prime P 
n a positive integer 


when Re(s) > 1. We have just seen that the left-hand side converges at s = 1 
provided the terms are taken in order. This does not automatically tell us anything 
about the convergence of the infinite product on the right-hand side, at s = 1, but 
one can show that if L(1,x) 4 0 (more on that in section of appendix 12D 
when x is a real non-principal character), then one can take logarithms in 
to deduce that 


(8.17.2) S- x(P) converges as &% —> ov. 


p prime 
psu 


Dirichlet attacked primes in arithmetic progressions by using (8.16.3) to identify 
primes p for which p = a (mod q). The trick is to rewrite this condition as pa~' = 1 
(mod q), so that x(pa~') = X(a)x(p), and therefore 


1 ifp=a (mod4q), 
sig Hort) ={ ak 


eX (a) 0 otherwise. 


This allowed Dirichlet to note that 


LE et ee 
De Py ag? Ha) Oe MONO) = Gig 2. XO) De SD 


p prime, p<a P p prime x qd xEX(q) p prime 
p=a (mod 4) psx psa 


Now (8.17.2) implies that all of the terms on the right-hand side with y non- 
principal are bounded. The sum in the yo-term is the sum of 1/p over all primes 
p< less a finite number of terms (the prime divisors p of gq). Therefore 


1 1 1 
S — equals —— s — + a bounded quantity. 
p prime, p<ax P 9(4) p prime, 
p=a (mod gq) psx 


Now, in Euler’s proof of the infinitude of primes, (5.12.3), we saw that the sum 
> p< 1/p diverges to oo. Therefore 


1 ; 
S- — diverges to co as & — 00; 


p prime, p< 
p=a (mod q) 


that is, there are infinitely many primes p = a (mod q). Moreover we can even 


deduce that 
1 
-/ = —-> las t> ~« 
Pp 


p 2 ee p prime, p<x 
p=a (mod q) p=b (mod q) 


for any integers a,b which are coprime to gq. 


332 Appendix 8D. Dirichlet characters and primes in arithmetic progressions 


In these early results we “weight” each prime with 1/p, which seems unnatural. 
This can be removed. Indeed, Dirichlet’s proof can be combined with the ideas in 
the proof of the prime number theorem to show that for each (a,q) = 1 


i < d p= d 1 
#ip prime, p<a, and p=a (modg)} _, ee. 
#{p prime, p < z} o(@) 
In other words, if x is sufficiently large, there are roughly the same number of 
primes in each arithmetic progression a (mod q) with (a,q) = 1. The proof of this 
will be a central theme in [Graal. 


In the next two exercises one applies Dirichlet’s Theorem on primes in arith- 
metic progressions. 


Exercise 8.17.1. (a) Prove that there are infinitely many integers n for which u(n)+p(n+1) = 
—1. 
(b) Prove that there are infinitely many integers n for which p(n) + u(n +1) = 1. 


Exercise 8.17.2. (a) Given o € {—1,1} and prime q, show that there are infinitely many 
primes p for which (2 =o. 
(b) Given o1,...,0% € {—1, 1} and primes q1,..., qx, show that there are infinitely many primes 
p for which (2) =o; for 7 =1,2,...,k. 
(c) Prove that for any given integer B there are infinitely many primes p such that there are 


no integers n,1 <n < B, for which () = (4+) — (2%). 


We will return to the question of finding strings of consecutive quadratic residues with more 
sophisticated tools in section 14.6. 


Uniformity questions 


Above we noted that, following the work of Dirichlet and the proof of the prime 
number theorem, one knows that the primes are more or less equally distributed 
among the 4(q) reduced residue classes mod q; that is, if (a,q) = 1, then 

. 1 (ax) 
(8.17.3) #{p prime, p<az, p=a (modq)}~ Gi 


as x — oo. The important issue in many applications is to estimate how large x 
needs to be for to hold. To be precise how large does x need to be for the 
ratio of the two sides to be guaranteed to be between 1 — 7 and 1+7 for some very 
small 7 > 0? Calculations suggest that holds once x > q'**, but there are 
no ideas around as to how to prove this. The best that (suitable) generalizations 
of the Riemann Hypothesis can offer is a range like x > q?+*. The results that 
have been proved unconditionally are a lot weaker: that (8.17.3) holds in the range 
x > exp(q*). 


Appendix 8E. Quadratic 
reciprocity and recurrence 
sequences 


8.18. The Fibonacci numbers modulo p 


Using the binomial theorem, for each n > 0 we have 


z+y)" —(e@—y)” 1 a are ae WY ins 5 
( ) ( We ie J(yi —(-y))) = ys (“Je jyd—1, 
2y 2y j=0 J 1<j<n 
yj odd 
Writing j = 1+ 2k, with « = 1/2 and y = V5/2 we deduce that 


1 n k 
Fy, = . 
ma ea 


O<k<s 854 


-1 
7: 


Pp 


Se) except when k = ? 


Now if n = p, where p is prime, then p divides each ( 
Therefore 
pa 5 
Fp = pr = (>) (mod p), 
Pp 
using Euler’s criterion. If n = p+1, then p divides each ( 
and k = po, so that 


pt+1 


ea) except when k = 0 


Fe ages per ni+s®)=5(1+(7)) (mod p). 


od p), and. Fy-1= F,44- = 
fy 


Therefore if (3) = 1, then £,. = fou = 1m 
0 (mod p). Therefore F,-1 = Fo (mod p),F, = Fi; (mod p), and Fy41 
(mod p). 


Il 


Exercise 8.18.1. Deduce that if (5/p) = 1, then Fy4p)—1 = Fn (mod p) for all n > 0. 


333 


334 Appendix 8E. Quadratic reciprocity and recurrence sequences 


On the other hand if (5/p) = —1, then F, = —1 (mod p) and F,41 = 0 
(mod p), so that Fy42 = Fy43 = —1 (mod p). 


Exercise 8.18.2. Deduce that if (5/p) = —1, then Fy42p42 = Fn (mod p) for all n > 0. 


Exercise 8.18.3.' Let to = 0,21 = 1, and an+2 = atn414+b%¢y for alln > 0, and let A := a2+4b. 
Suppose that prime p does not divide aA. 
(a) Show that if (A/p) = 1, then zn+p—1 =n (mod p) for all n > 0. 
(b) Show that if (A/p) = —1, then an4p41 = —bry (mod p) for all n > 0. 
(c) Deduce that there exists a positive integer d < p? — 1 such that 2,44 =2n (mod p) for all 
n>0. 


Exercise 8.18.4.' Let zo = 0,21 = 1, and ani2 = atn41+ben for all n > 0, and let A := a?+4b. 
(a) Show that ap, = n(a/2)"—! (mod A) for all n > 1. 
(b) Deduce that if p divides A, then an4p = $2n (mod p) for all n > 0. 
(c) Prove that if pla, then tn4p,-1 = (2) Zn (mod p) for all n > 0. 
We can combine the previous two exercises to get the following useful general 
result (checking the p = 2 case carefully): 


Corollary 8.18.1. Let x9 = 0, 71 = 1, and tn42 = atn414+ bz, for alln > 0. Let 
A :=a?+4b. For every prime p we have 
(4) = 0 (mod p), 


Pp 


except if p= 2, in the cases for which a is odd and b is even. 


Exercise 8.18.5.1 Let (fn)n>o0 be as in exercise[8.18.3] with (a,b) = 1, and let p be a prime. 
(a) Prove that (a,b) = 1 for alln > 0. 
(b) Prove that (an,%n+41) = 1 for alln > 0. 
(c) By considering the possible pairs {an,2%n+1} (mod p), prove that there exists a positive 
integer d < p? — 1 such that 2,44 =2@n (mod p) for alln > 0. 


Exercise 8.18.6.' Let r be the smallest integer > 1 for which given prime p divides F,. 
(a) Using exercises [0.4.10] and [2.5.19[c) to show that Fy, = riley (mod F?) and Fri = 
Fr, (mod FR). 
(b) Suppose that Fp, (mod p) has period k. Deduce that F, (mod p?) has period k or kp. 
(c) Deduce that the period of F, (mod p?) divides p(p—1) if (5/p) = 1, and it divides 2p(p+ 1) 
if (5/p) = —-1. 


8.19. General second-order linear recurrence sequences modulo p 


Let zp = 0, 41 = 1, and z, = ar%,_1 + b%p_2 for all n > 2. In this section we think 
of x, € Z{a,b], a polynomial in a and 6, and let A = a? + 4b. We shall study the 
values of x, (mod p) in Z{a, }]. 


Theorem 8.11. If p is an odd prime, then x» = A®-)/? (mod p), 


Lp-1 = —2a Il (a2—rA) (mod p), 
r (mod p) 
(r/p)=1 
r#1 (mod p) 


and F 
p+ = 5 II (a*—rA) (mod p). 


r (mod p) 
(r/p)=-1 


8.20. Prime values in recurrence sequences 339 


We deduce that 


ASp—-12 p41 = a7? — a? Ph = II (a2 —kb) (mod p). 
k (mod p) 


Proof. As above we expand 


2" an = : ((a+ VA)" (a vA)") 


using the binomial theorem, so that 


Pp 
a= (7) ae PAN (=1) ya 2AP- (med p) 
k=1 
as p divides the binomial coefficients ca for 1 < k < p—1, and the first result 
follows by dividing through by 2 and using Fermat’s Little Theorem. We also use 
the binomial expansion of the pth power to study x,_; and x,;; (mod p): 


241 = = ((a + VA)(a+ VA) ~ (a VA)(a—~ VAY?) 


VA 
= ‘ D gh ti-k A (k-1)/2(4 — (1) ~ Pp aP—k Ak/2 _4)k 
ey A Ga +> (7) APG (1) 


= 2aA”-)/? 4 Qa? = 2a((a?)P-Y/? + A®P-Y/2) (mod p); 
and multiplying through by —4b = (a + VA)(a — VA) we also have 
1 
=P lhe p= ((a ~ JA)(a+ VA)? — (a+ VA)(a— vA)’) 


VA 
= 3 G OR (a= 3 (7) aP-* A*/2(1 +. (-1)*) 


= aA tr-2/2 =r Stag ee _Xo-D/) (mod p). 
The next two parts of the result now follow from (8.2.1) and (8.2.2). The final part 
comes from multiplying together these last two displayed equations to obtain 
4PbAGy—12p41 = (a?)PA — a? A? 
= a”? (a? + 4b) — a?(a?? + 4b?) = 4(a??b— ab?) (mod p), 
and the result follows by dividing through by 46 (if p does not divide b). 


n—1 


In the case that p divides b we have rt, = a (mod p) for all n > 1, and so 


Atp_12y11 = a? - aP-?- a? =a? (mod p). 
p—lvpt+l 


8.20. Prime values in recurrence sequences 


It is conjectured that there are infinitely many Mersenne primes 2? — 1 as well as 
Fibonacci primes F,. There are 33 known Fibonacci primes. The first few are F3 = 
2, Fy = 3, Fs =5, Fy = 13, Fy, = 89, Fig = 233, Fiz = 1597, Fo3 = 28657,.... 
Notice that Fy) = 4181 = 37 x 113 is composite. 


336 Appendix 8E. Quadratic reciprocity and recurrence sequences 


Exercise 8.20.1. Assume that p and gq = 2p+ 1 are both prime. Deduce that q divides 2? — 1 
whenever p = 3 (mod 4), and so 2? — 1 is not a Mersenne prime. 


In chapter 3 we applied exercise [3.9.2] to the Mersenne numbers M,, = 2” — 1, 
with m = 1, so that if M,, is prime, then n is prime; and to Fibonacci numbers 
with m = 2 so that if F;, is prime, then n is prime or n = 4. We can streamline the 
search for other prime factors of Mersenne numbers by using exercise |8.4.4 


“The hundred-year-old prime”: Euler proved that the Mersenne number M3; = 
2147483647 is prime, as follows: If M3, is composite and q is the smallest prime 
dividing 2° — 1, then 31|¢— 1 and q = +1 (mod 8), by exercise a). Hence 
q = 1 or 63 (mod 248) with q < M3, < 23'/? < 46341. There are 84 such primes 
q (as Euler discovered by looking at tables of primes) and then he verified that none 
of them divide M3,. Hence M3, is prime. This remained the largest prime known 
for over a hundred years. As of January 1, 2019, nine of the ten largest known 
prime numbers are Mersenne primes (the largest being Mgo589933). 


Should we believe that there are infinitely many Mersenne primes? A stan- 
dard heuristic goes as follows: From Gauss’s remarks (see sections [5.4] and [5.15] of 
appendix 5C) we believe that a randomly chosen integer around «x is prime with 
probability 1/logz. Obviously the M, are actual given numbers and are not ran- 
domly chosen, but if we suppose that their primality is about as likely as that of a 
random number, then we would guess that the number of primes M, with p < z is 


roughly ‘i ; i 
= log(2? — 1) log 2 D 


pSu 


Now this + oo by and even grows like loglogz, as in (6.12.4). However 
this heuristic is not supported by the data: There are 51 known Mersenne primes 
and M, is known to be composite for around 25 million primes p. The heuristic 
would predict no more than three Mersenne primes up to this point! 


We can modify the heuristic to take account of the fact that the prime factors 
of 2? — 1 are all = 1 (mod p) and in particular are all > p. The probability that 
an integer around (sufficiently large) x, that is not divisible by any prime < p, is 
prime is around cone for some constant c > 0. This alters the sum in our heuristic 


to 1 
Cc ae = c loga. 


pSau 
(This last approximation can be deduced from (6.12.4).) This seems much more 
compatible with the known data; indeed one might even guess that c is around 
2.55. 


Chapter 9 


Quadratic equations 


Can we tell whether a given large integer is the sum of two squares of integers (other 
than by summing every possible pair of smaller squares)? How about the values 
of other quadratics? We will show, in this chapter, how we can understand a lot 
about solutions to quadratic equations in integers, by understanding the solutions 
to those quadratic equations modulo p, for every prime p. We begin by studying the 
values taken by x? + y? when we substitute integers in for 2 and y, then ax? + by? 
for arbitrary integer coefficients a,b, and then finally the general binary quadratic 
form, ax? + bry + cy?. 


9.1. Sums of two squares 


The list of integers that are the sum of two squares of integers begins: 
0,1, 2,4, 5, 8, 9, 10, 13, 16, 17, 18, 20, 25, 26, 29, 32, 34, 36, 37, 40, 41, 45, 49, 50,.... 


Is there a pattern? Can we easily determine whether a given integer is the sum of 
two squares by any means other than trying to find two squares that sum to it? No 
pattern emerges easily from the list above so we begin focusing on the primes that 
appear in this list, namely 


2= 17417, 5 = 17427, 13 = 27437, 17 = 17447, 29 = 57427, 837 = 174+67,.... 


What do the odd primes in the list, 5, 13, 17, 29, 37, 41, 53, 61, 73, 89,97,... have in 
common? The only easy-to-spot pattern is that the differences between consecutive 
odd primes in our list, 18—5,17—13,29—17,... are all multiples of 4, which implies 
that they are all =1 (mod 4). 


Proposition 9.1.1. [fp is an odd prime that is the sum of two squares, then p = 1 
(mod 4). 


337 


338 9. Quadratic equations 


Proof. If p = a?+b?, then p{ a, or else p|p—a? = b? so that p|b and p?|a? +b? = p, 
which is impossible. Similarly pt b. Now a? = —b? (mod p) so that 


-()-@Y-B) 


and therefore p = 1 (mod 4) by Theorem 


Exercise 9.1.1. Prove that any odd integer n that can be written as the sum of two squares 
must be = 1 (mod 4). Deduce Proposition[9.1.1 


Exercise 9.1.2. Prove that if prime p divides a? + b?, then either p = 2 or p divides (a,b) or 
p=1 (mod 4). 


Remarkably this is an “if and only if ” condition: 


Theorem 9.1. Every prime p = 1 (mod 4) can be written as the sum of two 
squares (of integers). 


Proof. Since p = 1 (mod 4) we know that there exists an integer 6 such that 
b? = —1 (mod p). Consider now the set of integers 


{j+kb: 0< j,k < [Vp]}. 
The number of pairs of integers j,k used in the construction of this set is 
({\/p] + 1)? > p, and so by the pigeonhole principle, two of the numbers in the 
set must be congruent mod p; say that 

j+kb=J+Kb (mod p) 


where 0 < j,k, J, K < [,/p] and {j,k} # {J,K}. Let r=j —J ands= K—kso 
that 
r=bs (mod p) 
where |r|, |s| < [\/p] < ,/p and r and s are not both 0. Now 
r? + 8” = (bs)? +5? =87(b? +1) =0 (mod p), 


and0Q<r?4+s2< VP + JP = 2p. The only multiple of p between 0 and 2p is p, 
and therefore r? + s? = p. 


We will use the identity 
(9.1.1) (a? + b?)(c? + d”) = (ac — bd)” + (ad + be)? 


to determine which composite integers can be written as the sum of two squares. 
Theorem [9.1] tells us that any prime p = 1 (mod 4) can be written as the sum of 
two squares; for example 5 = 17 + 2? and 13 = 2? + 37. Then (9.1.1) yields that 
65 = 42 +7?: if we write instead 13 = 3? + 2?, then we obtain 65 = 17 + 87. Indeed 
any integer that is the product of two distinct primes = 1 (mod 4) can be written 
as the sum of two squares like this, and even in two different ways. We will discuss 
the number of representations further in appendix 9C. 


Exercise 9.1.3. Find four distinct representations of 1105 = 5 x 13 x 17 as a sum of two squares. 


Exercise 9.1.4. Prove that ifn = ni---nz where n1,...,n~% are each the sum of two squares, 
then n is the sum of two squares. 


9.1. Sums of two squares 339 


Theorem 9.2. Positive integer n can be written as the sum of two squares of 
integers if and only if for every prime p = 3 (mod 4) which divides n, the exact 
power of p dividing n is even. 


Proof. Suppose that n = a? +b? where g = (a,b), so we can write a = gA, b= gB, 
and n = g?N for some coprime integers A and B, with N = A? + B?. Therefore if 
p is a prime = 3 (mod 4), then p cannot divide N, by exercise and so if pln, 
then p|g. Moreover if p*||g, then p?*||n, as claimed. 

On the other hand, if n = g?m where m is squarefree, then m has no prime 
factors = 3 (mod 4) by the hypothesis. Therefore all the prime factors of m can be 
written as the sum of two squares by Theorem[9.1] and so their product, m, is the 
sum of two squares by exercise[9.1.4] say m = u2 + v?. Then n = (gu)? + (gv)?. 


Exercise 9.1.5. Prove that if n is squarefree and is the sum of two squares, then every positive 
divisor of n is also the sum of two squares. 


We saw that (9.1.1) is a useful identity. To find such an identity let i be a 


complex number for which i? = —1. Then x? +y? = (x +iy)(x —iy), a factorization 
into numbers of the form a + bi where a and 0b are integers. Therefore 
(a? +6*)-(@ +d) = (a+bi)(a—bi) + (e4+di)(e—di) 
= (a+bi)(c+di) - (a — bi)(c — dt) 


= ((ac — bd) + (ad + bc)i) - ((ac — bd) — (ad + bc)i) 

- (ac — bd)? + (ad + be)?, 
and so we get (9.1.1). A different rearrangement leads to a different identity: 
(9.1.2) (a? +b?)(c? +d?) = (a+bi)(c—di)-(a—bi)(e+di) = (ac+bd)? + (ad—bc)’. 


Theorem [9.2] has the following surprising corollary: 


Exercise 9.1.6. Deduce that positive integer n can be written as the sum of two squares of 
rationals if and only if n can be written as the sum of two squares of integers. 


This suggests that we can focus, in this question, on rational solutions. In 
section [6.1] we saw how to find all solutions to 2? + y? = 1 in rationals x,y. How 
about all rational solutions to #7 + y? =n? 


Proposition 9.1.2. Suppose that n = a? +7. Then all solutions in rationals x, y 
to x? +y? =n are given by the parametrization 
2brs + a(r? — s”) 2ars + b(s? — r?) 
f= ; = 
r2 + s? @ r2 + 5? 


where r and s are coprime integers. 


(9.1.3) 


>) 


Proof. Let x,y be any rationals for which x? + y? =n. Just as in our geometric 
proof of we will parametrize these rational points (x,y) by noting that if t 
is the slope of the line between (a,b) and (x,y), then ¢ is rational, and vice versa. 
In particular we let u = 2 — a and t = (y — b)/u when u 4 0, which must both be 
rational numbers. Then 


O=n—n=(a+u)? + (b+ tu)? — (a? +b?) = 2u(a + bt) + w72(14+ 2”), 


340 9. Quadratic equations 


so that, as u # 0, we have 

_ -2(a+ bt) — 2brs — 2as? 

1 4t2 2 4 8? 
writing the rational number t as t = —r/s where r and s are coprime integers. 
Substituting this value of u into x = a+u and y = b+ ut gives the claimed 
parametrization. 


If wu = 0, then x = a so that either y = b or y = —b. The line between (a, b) 
and (a, —b) is the vertical line « = a (corresponding to r = 1, s = 0 so that t = oo). 
Finally we obtain the initial point (a,b) in this parametrization by taking r = 
a,s = b. This is obtained by taking the slope to be t = —a/b, the slope of the 
tangent line to the curve x? + y? = n at the point (a,b). 


In Theorem [9.1] we saw that every prime p= 1 (mod 4) can be written as the 
sum of two squares. Examples suggest that there is a unique such representation, 
up to signs and changing the order of the squares, as the reader will now prove: 
Exercise 9.1.7.1 Suppose that prime p = a? + 0?. 

(a) Prove that |aJ, |b] < \/p. 
(b) Prove that if r? = —1 (mod p), then either r = a/b (mod p) or r = b/a (mod p). 
(c) If prime p divides c? +d? but p{ cd, show that p divides either ac— bd or ad—bc, and deduce 
that p divides both terms on the right-hand side of either (9.1.1) or (9.1.2), respectively. 
(d) Suppose that p = a? + b? = c? + d? where a,b, c,d > 0. Show that {a,b} = {c,d}. 
In other words, we have proved that each prime = 1 (mod 4) has a unique representation as the 


sum of two squares, unique up to changing the order of the squares, or their signs. 


Exercise 9.1.8.1 Prove, using the method of Theorem [9.1] that a squarefree integer n can be 
written as the sum of two squares if and only if —1 is a square mod n. 


9.2. The values of x? + dy? 


What values does x? + 2y? take? Let’s start again with the prime values: 

2,3, 11,17, 19, 41, 43, 59, 67, 73, 83, 89,97,.... 
There is no obvious pattern; but this list contains exactly the same odd primes that 
we found in section|8.4] when exploring when (=2) = 1. This link is no coincldence 


for if we suppose that odd prime p = x? + 2y?, then p does not divide x or y and so 


Og scp Ca 
P P P pp) \p pj 
From and (8.7.2), we know that (=) = 1 if and only if p=1 or 3 (mod 8). 


On the other hand if (—2/p) = 1, then select b (mod p) such that b? = —2 
(mod p). We take R = a ip, S= ya in exercise [9.7.3] so that there exist 
integers r and s, not both 0, with |r| < R and |s| < 9, for which p divides r? + 2s. 
Therefore 0 < r? + 2s? < 23/2p < 3p, and so r? + 2s? = p or 2p. In the latter case, 
2 divides 2p — 2s? = r? so that 2|r. Writing r = 2R we have s? + 2R? = p. Hence, 
either way, p can be written in the form m? + 2n?. Therefore we have proved: 


Theorem 9.3. Odd prime p can be written in the form m? + 2n? if and only if 
p=1 or3 (mod 8). 


9.3. Is there a solution to a given quadratic equation? 341 


The identity 
(a? + 2b7)(c? + 2d?) = (ac + 2bd)* + 2(ad — bc)? 


is analogous to (9.1.1). Using this, one can prove, analogous to the proof for u? +v? 
in the first half of section 9.1] that positive integer n can be written as r? + 2s? if 
and only if for every prime p = 5 or 7 (mod 8) which divides n, the exact power of 
p dividing n is even. 

Can we also modify this proof for values of x? + 3y?? Or a? + 5y?? We explore 
this in the following exercises. 


Exercise 9.2.1. Fix integer d > 1. Give an identity showing that the product of two integers of 
the form a? + db? is also of this form. 


Exercise 9.2.2. Which primes are of the form a? + 3b?? Which integers? 


Exercise 9.2.3. Which primes are of the form a? + 5b?? Try listing what primes are represented 
and compare the list with the set of primes p for which (—5/p) = 1. 


9.3. Is there a solution to a given quadratic equation? 


It is easy to see that there do not exist non-zero integers a, b,c such that a? +5b? = 
3c’, for, if we take the smallest non-zero solution, then we have 


a” =3c? (mod 5) 


which implies that a = c = 0 (mod 5) since (3/5) = —1, and so b = 0 (mod 5). 
Therefore a/5,b/5,c/5 gives a smaller solution to x? + 5y? = 3z?, contradicting 
minimality. 

Another proof stems from looking at the equation mod 4 since then a?+b?+c? = 
0 (mod 4), and 0 and 1 are the only squares mod 4. Therefore if three squares sum 
to an integer that is 0 (mod 4), then they must all be even. But then a/2, b/2,c/2 
gives a smaller solution, contradicting minimality. 


So we have now presented two different proofs that there are no non-zero solu- 
tions in integers to a? + 5b? = 3c”, by working with two different moduli. 


For all quadratic equations in three or more variables with real solutions, there 
is never just one prime or prime power modulo which there are no solutions to the 
given equation—when there is one, there is always a second. And indeed when 
there is a third proof, then there is always a fourth. A remarkable consequence 
of the theory (see appendix 9B) is that if a given quadratic equation in three or 
more variables has non-zero real solutions but no non-zero integer solutions, then 
there are always exactly an even number of different primes p such that the given 
equation has no non-trivial solutions mod p* for some k > 1. Moreover the odd 
primes involved must divide the coefficients of the equation. On the other hand, if 
there are no such “mod p* obstructions”, then there must be at least one non-zero 
integer solution (implying that there must be a real solution!). 


In exercise[3.6.4] we proved that there are integer solutions (m,n) to am-+bn = c 
if and only if there are solutions u,v (mod b) to au+bv =c (mod b). Similarly we 
will show that if a, b, and c are pairwise coprime, positive integers, then there are 
rational solutions (x,y) to ax? + by? = c if and only if there are coprime solutions 
u,v (mod 4abc) to au? + bv? = c (mod 4abc). This is an amazing theorem since 


342 9. Quadratic equations 


to determine whether a quadratic equation has solutions in rationals we need only 
verify whether it has solutions modulo a finite modulus. 


To work on rational solutions (x,y) to ax? + by? = c it is convenient to develop 
this into a question about integer solutions and to manipulate the equation to a 
more convenient form: 


(i) We may assume that each of a,b,c is a squarefree integer or else, if, say, 
a = p’A, the rational solutions to ax? + by? = ¢ are in 1-to-1 correspondence 
with those of AX? + by? =c, taking X = pz. If b is divisible by a square, we 
proceed analogously. If c = q?C, then the rational solutions to ax? + by? = c 
are in 1-to-1 correspondence with those of aX? + bY? = C, taking X = x/q 
and Y = y/q. 

(ii) We may assume that a,b,c are pairwise coprime or else if, say, a = pA and 
b = pB, then AX? + BY? = C with X = pz, Y = py, and C = pe; and if 
a=qA and c= qC, then Ax? + BY? =C with B = bq and Y = y/q. 

(iii) Letting n be the lowest common denominator of the rationals x and y, we 
write x = ¢/n with y = m/n so that ¢,m,n are integers with (¢€,m,n) = 1 
and af? + bm? = cn?. 

(iv) We may assume that al?,bm?,cn? are pairwise coprime. If not, suppose that 
prime p divides af? and bm?, so that p divides al? + bm? = cn?. Now p 
can only divide one of a,b,c (since they are pairwise coprime), say, c, and 
so must divide ? and m?. But then p divides £ and m, and so p? divides 
al? + bm? = cn?. Hence p divides n, as p? { c, contradicting that (¢,m,n) = 1. 


Therefore the correct formulation of our result is as follows: 


Theorem 9.4 (The local-global principle for quadratic equations). Let a, b, 
and c be given pairwise coprime, squarefree integers. There are solutions in 


Non-zero integers €,m,n to al? + bm? + cn? = 0 with (al?, bm?) = 1 
if and only if there are solutions in 
Non-zero real numbers X, u,v to ad? + bu? + cw? = 0, 
and, for all positive integers r, there exist 
Residue classes u,v,w (mod r) for which au? + bv? + ew? =0 (mod r), 


with (au?, bv?,cw?,r) = 1. 


Proof =>: We may take \=u=@, pw=v=m, v=w =n throughout. 


The proof in the other direction is the difficult part; it follows along the lines 
of the proof of Theorem [9.1] but is more complicated. In appendix 9a we rephrase 
that proof in the language of lattices, before completing the proof of the local-global 
principle. 

We can reduce the set of moduli to be considered using the following lemma. 


Lemma 9.3.1. Let a,b,c be given pairwise coprime, squarefree integers. There are 
residue classes u,v,w (mod r) with (au?, bv?, cw?,r) =1 for which 


au” + bv? +cew? =0 (mod r) 


9.3. Is there a solution to a given quadratic equation? 343 


for every positive integer r, if and only if there are such solutions for r = 8, and 
for r =p for every odd prime p dividing abc. 


This result implies that, as in exercise |3.6.4| we can restrict our attention in 
Theorem [9.4] to just one modulus, namely r = 8|abc|. 


Proof. We can restrict our attention to prime power moduli p* by the Chinese 
Reminder Theorem. We will prove that there are such appropriate solutions mod 
p* by induction on k: for k > 1 when p is odd and for k > 3 when p = 2. There are 
appropriate solutions modulo every odd prime p and modulo 2?, by the hypothesis 
for primes p dividing 2abc, and by exercise [8.9.9] for all odd primes p that do not 
divide abc. 


So now assume we have an appropriate solution mod p*, so that p does not 
divide at least one of au”, bv?,cw?, say, au? (and an analogous argument works 
if p does not divide one of the others). Let R = —a~‘(bv? + cw), so that 
u? = R (mod p*) by the induction hypothesis. By Proposition [8.8.1] there ex- 
ists U (mod p*+!) for which U? = R (mod p*t') so that aU? + bu? + cw? = 0 
(mod p*+!) and (U,p) = 1. 


Now if au? + bv?+ cw? = 0 (mod a) with (a, bv”, cw?) = 1, then —be = (ew/v)? 
(mod a); that is, —bc is a square (mod p) for every prime dividing a. Making 
similar remarks modulo b and c, we find Legendre’s formulation of the local-global 
principle[}| 


Theorem 9.5 (Legendre’s local-global principle, 1785). Let a,b,c be given 

pairwise coprime, squarefree integers which do not all have the same sign. There 
are solutions in non-zero integers €,m,n to al? + bm? + cn? = 0 if and only if 
—ab is a square mod |c|, —ac is a square mod |b|, and —bc is a square mod |a|. 


Note that af? + bm? + en? = 0 has solutions in non-zero reals if and only if 
a,b,c do not all have the same sign. 


This principle may be extended to the rational solutions of more or less any 
quadratic equation: Any quadratic polynomial in n variables can be diagonalized; 
that is, a linear change of variables can change the polynomial into a diagonal 
quadratic polynomial. We know that in the example g = ax? + bry + cy? we can 
let X = x + by/2a and then g = aX? + Dy? where D = —(b? — 4ac)/4a. In a 
three-variable example we take the polynomial 


f =a? + Qay + Baz + 47? + Byz + 62? + 7a + By + 9z + 10; 


we let X = x+y+2z2+4 replace x to obtain f = X?+3y?+2yzt Bz? +y— 32-2. 
Then letting Y = y+ 4+ % we obtain f = X?+3Y?+ 42? —- 42-2, and if 


12 6 3? 
z=624+ Te this becomes 


423 


F=X?+3Y? +1237? 
164’ 


1The careful reader will note that we do not seem to have made adequate remarks about the 
solution modulo powers of 2. However, we noted earlier in this section that if there are solutions in the 
reals and modulo all but one prime, then there is a solution modulo all powers of this last prime. For 
more details see appendix 9B. 


344 9. Quadratic equations 


a diagonal quadratic with no “cross terms” (like XY). Notice that the rational 
solutions to F'(X,Y,Z) = 0 are in 1-to-1 correspondence with the rational solutions 
to f(x,y, z) =0. 

Whether or not a given diagonal quadratic with three or more terms has rational 
solutions can then be resolved by the local-global principle? 
Exercise 9.3.1. Given one integer solution to ax? + by? + cze = 0, show that all other integer 
solutions to ax? + by? + cz? = 0 are given by the parametrization 


x:y:2 = (ar? —bs”)xq + 2brsyo : 2arsao — (ar? — bs”)yo : (ar? + bs)z0 . 


9.4. Representation of integers by ax? + by? with x,y rational, and 
beyond 


Coprime integer solutions to au? + bv? = cw? with w > 0 are in 1-to-1 correspon- 
dence with the rational solutions to ax? +by? = c, by taking x = u/w and y = v/w. 
Therefore the local-global principle can be restated to give an “if and only if” cri- 
terion to determine whether ¢ can be written as ax? + by? with x and y rational. 
This is most usefully modified as follows: 


Corollary 9.4.1. Suppose that a,b,c are given integers with (a,b,c) = 1, and 
suppose d = b* — 4ac is not divisible by the square of any odd prime. For any 
given squarefree integer N with (N,d) =1, there exist rationals u and v for which 
N = au? + buv + cv? if and only if the following criteria hold: 


e N has the same sign as a orc, ord > 0; 


e dis a square mod N; 


e (2) = (s) for all odd primes p dividing d that do not divide a; 
e (2) = (<) for all odd primes p dividing both d and a. 
Proof. If N = au? + buv + cv?, then we multiply through by 4a to obtain 4aN = 
(2au + bv)? — dv”; in other words, aN = U? — dV? for some rationals U,V. We 
may reverse this argument, and so there exist rationals u and v for which N = 
au? + buv + cv? if and only if there exist rationals U,V for which aN = U? — dV?. 
We now apply Legendre’s version of the local-global principle to rational solutions 
to the equation aN = u? — dv?. 

We have real solutions if and only if aN >0ord> 0. 


Now U? = dV? (mod aN) and so d must be a square mod aN. But d = 
b? — 4dac = b? (mod a), so we need only verify that d is a square mod N. 


If odd prime p divides d, then aN = u? (mod p), and so (2) = () if p does 


not divide a. 

If odd prime p divides both d and a, then it divides b, as it divides b? = d+4ac. 
Therefore p does not divide c as (a,b,c) = 1. We then run through the analogous 
argument with a replaced by c. (For the primes p dividing d, but not 4ac, our 


results that (2) = (2) and (2) a (<) are consistent; see exercise [8.1.4]) 


Pp Pp 


? Which we have only proved in three variables but is true in three or more variables. 


9.6. Primes represented by x? + 5y? 345 


9.5. The failure of the local-global principle for quadratic 
equations in integers 


We have seen how the local-global principle allows us to determine whether there 
are rational solutions x,y to a given equation of the form ax? + by? = c. However 
we will now show that it does not help when we ask for integer solutions. The 
example 


x? + 23y? = 52 
has rational solutions, like (5, 3), (2, 3). (22, is). . There are obviously no in- 


teger solutions or else 23y? < x? + 23y? = 52 and so y? = 0 or 1, but then 
x? = 52—23y? = 52 or 29, which are not squares. Since there are rational solutions 
we know that there are non-trivial solutions to a? + 23b? = 52c? (mod p*) for all 
prime powers p* by the local-global principle, but not necessarily to a? + 23b? = 52 
(mod p*). To prove that there are such solutions, we show that solutions exist 
modulo 8 and all odd prime moduli p, and then we lift these solutions to all prime 
power moduli p*, using Proposition B.8.1 

We have the solutions 2? + 23 - 4? = 372 = 52 (mod 8), 47 + 23-1? = 39 = 52 
(mod 13), and 11? + 23-0? = 121 = 52 (mod 23). For any odd prime p other than 
13 or 23, there are pet residues mod p of the form 23y?, and pet residues mod p 
of the form 52 — x”, so two of these residues must be equal. Therefore there is a 
solution to x? + 23y? = 52 (mod p), and evidently one of « and y must be non-zero 
(mod p) (or else p would divide 52). 


Therefore we have shown that the local-global principle holds for integer and 
rational solutions of linear equations, and for rational but not integer solutions of 
quadratic equations. However it does not even hold for rational solutions of cubic 
equations: In 1957, Selmer showed that 3x3 + 4y? = 5 has solutions in the reals, 
and mod r for all r > 1, yet has no rational solutions. Further discussion of the 
failure of the local-global principle for cubic equations can be found in |Grab], with 
a motivating discussion in chapter 7. 


9.6. Primes represented by x? + 5y? 


Calculations reveal that the primes > 5 that are represented by x? + 5y? are 
29, 41,61, 89, 101, 109, 149, 181,.... 


From our explorations of the binary quadratic forms x? + y?, 77+ 2y?, and x? +3y? 
we might guess that this should be the set of primes for which (—5/p) = 1. However 
the list of primes for which (—5/p) = 1 also includes the primes 


3, 7, 23, 43, 47, 67, 83, 103, 107, 127, 163, 167,.... 


What is going on? We quickly see that the primes in the first list end in a 1 or a 
9, whereas the primes in the second list end in a 3 or a 7, so there seems to be a 
further congruence condition that partitions the list. Further examination of the 
equation p = x” + 5y? makes this evident: Besides (—5/p) = 1, we can also deduce 
that p= «2? (mod 5) so that (p/5) = 1. Combined with (—5/p) = 1, this also yields 
that p= 1 (mod 4). These two conditions together give that p= 1 or 9 (mod 20), 


346 9. Quadratic equations 


the primes that we see in the first list, and if (p/5) = —1, then we obtain p = 3 or 
7 (mod 20), the primes that we see in the second list. 


Where do the primes in the second list come from? It turns out there is a 
second, fundamentally different binary quadratic form, 2”? + 2ry + 3y”, which has 
the same discriminant —20 as x? + 5y?. We first observe that these quadratic forms 
definitely do not represent the same integers because 2”? + 2xy + 3y? represents 3, 
whereas x? + 5y? evidently does not. A quick calculation reveals that the second list 
is precisely the set of odd primes represented by 2x? + 2xy + 3y?. This dichotomy 
will be explored further in chapter 12, though we observe here that if prime p = 
Qa? + Qry + 3y?, then 2p = 4x? + 4ry + Gy? = (22 + y)? + 5y?; that is, 2p can be 
represented by a? + 5b? 

In general if we wish to represent the odd prime p by x? + dy’, then —d must 
be a square mod p. On the other hand, suppose that —d is a square mod p, say 
u? = —d (mod p) with |u| < p/2. 

If p < 2Vd, then we can write u2 + d = ap, so the binary quadratic form, 
pm? +2umn-+an?, has discriminant —4d, the same as x? +dy?, and takes the value 
p when m=1,n=0. 

Now assume that p > 2Vd. By exercise A) with A= a p.8 = 
ap, there exist integers r and s, not both 0, for which r = us (mod p) and 
so, squaring, r? = —ds? (mod p); that is, r? + ds? is a multiple of p. Moreover we 
have 0 < r?+ ds? < R? + dS? = 2V dp. Therefore there exists an integer a in the 
range l<a< 2\/d for which 


r? + ds? = ap. 


We may assume that (r,s) = 1 for if g = (r,s), then we claim that g? divides 
a, so we can divide r and s through by g. To justify our claim, note that g? 
divides r? + ds? = ap so if g? does not divide a, then p divides g. But then 
p< g? <r? +ds? = ap and so p < a < 2Vd, a contradiction. 

Now (s,a) = 1 or else if prime gq divides a and s, then it divides ap = —ds? = r?, 
and so it divides r, contradicting that (r,s) = 1. Let 6 be an integer for which b= 
r/s (mod a) so that 6? = —d (mod a). We define integers n = s, m = (r — bs)/a, 
and c = (b? +d)/a. This implies that am + bn =r and so 

(am + bn)? — (b? —ac)n?—r? + ds? 


am? + 2bmn + cn? = a =: 
a a 


Therefore, whenever —d is a square mod p, there is a quadratic equation in 
two variables, with positive leading coefficient < 2/d, and of discriminant —4d, 
which takes the value p. This is the first hint of a general theory: We will study 
the solutions to quadratic equations in two variables, like this, in detail, in chapter 
12. 


Additional exercises 


Exercise 9.7.1. Let f(n) be the arithmetic function for which f(n) = 1 if n can be written as 
the sum of two squares, and f(n) = 0 otherwise. Prove that f(n) is a multiplicative function. 


Questions on sums of squares 347 


Exercise 9.7.2. Let p be a prime = 1 (mod 4). This exercise yields another proof that p is the 
sum of two squares. 
(a) Use Theorem 8.3]to prove that there exist integers a and b such that a? + b? is a positive 
multiple of p. 
(b) Let rp be the smallest such multiple of p. Prove that r < p/2. 
(c)t Prove that if r > 1, then there exists a positive integer s < r/2 such that rs = c? + d? for 
some integers c and d, selected so that ad — bc is divisible by r. 
(d) Use to deduce that if r > 1, then sp is a sum of two squares. 


This contradicts the minimality of r unless r = 1; that is, p is the sum of two squares. 


Exercise 9.7.3. Let p be an odd prime. 
(a)? Suppose that b (mod p) is given and that R,S > 1 such that RS = p. Prove that there 
exist integers r,s with |r| < R,0 <s < S such that b=r/s (mod p). 
(b) Prove that there exists an integer m with |m| < \/p for which (2) =: 
(c) Deduce that if p = 1 (mod 4), then there exists an integer n in the range 1 < n < \/p for 


which (2) iat, 


Exercise 9.7.4. Show that x and y are integers in (9.1.3) if and only if r? +s? divides 2(ar +s), 
and show that this can only happen if r? + s? divides 2n. 


Exercise 9.7.5. What values of r and s yield the point (—a, —b) in Proposition 9.1.2) 
Exercise 9.7.6. Reprove exercise[9.1.8] using Theorem[9.1Jand (9.1.1). 


Exercise 9.7.7.1 337 + 562 = 65? and 162 + 63? = 65? are examples of the side lengths of 
different primitive Pythagorean triangles with the same hypotenuse. Classify those integers that 
appear as the hypotenuse of at least two different primitive Pythagorean triangles. 


Exercise 9.7.8. Prove that for every integer m there exists an integer n which is the length of 
the hypoteneuse of at least m different primitive Pythagorean triples. (You may use Theorem[7.4] 
which implies that there are infinitely many primes = 1 (mod 4).) 


Exercise 9.7.9.1 Prove that an integer of the form a? + 4b? with (a, 2b) = 1 cannot be divisible 


by any integer of the form m? — 2 with m > 1, or m? +2. Conversely prove that an integer of 

the form m? — 2n? or m? + 2n? with (m,2n) = 1 cannot be divisible by any integer of the form 
2 

a* +4. 


Exercise 9.7.10.! (Zagier’s proof that every prime = 1 (mod 4) is the sum of two squares) Let 
Sis {(2,y, z) € Ne p= co + 4yz}. 
Define the map ¢: S > S by 
(w+2z,z,y-a2-—2z) ifa<y-z, 
: (a,y, 2) > (2y TY, ys z) if y Z< aU < 2y, 
(w@—2y,c-—y+z,y) ifa>2y. 
(a) Show that ¢ is an involution, that is, ¢? = 1, and verify that each ¢(S) belongs to S. 
(b) Prove that if d(v) = v, then v = (1,1, b+). 
(c) Deduce that there are an odd number of elements of S' (in particular, S is non-empty). 
Let w: S — S be the involution ~(z, y, z) = (x, z,y). 


) Prove that w has a fixed point (x,y, y) so that z = y. 
(e) Deduce that p = «7 + (2y)? for some integers 2, y. 


Appendix 9A. Proof of 
the local-global principle 
for quadratic equations 


In this appendix we will give the difficult part of the proof of the local-global 
principle for quadratic equations, Theorem |9.4| as discussed at length in section 


The local-global principle for quadratic equations. Let a,b,c be given 
pairwise coprime, squarefree integers. There are solutions in 


non-zero integers f,m,n to al?+bm?+cn?=0 with (al?,bm?) =1 
if and only if there are solutions in 
non-zero real numbers , pl, to ad? + bu? +c? = 0, 
and, for all positive integers r, there exist 
residue classes u,v,w (mod r) for which au? +bv?+cw?=0 (mod r), 
with (au?, bv?, cw?,r) = 1. 


Our proof depends on an understanding of lattices. 


9.8. Lattices and quotients 


A lattice A in R” is the set of points obtained by integer linear combinations of n 
given linearly independent vectors. If the basis is x1, 22,...,%» € R”, then 


Ac= {m 121 + met +--+ + MyLp: M1,Mg,...,Mn © Z}. 


One can see that A is an additive group, but it also has some geometry connected 
to it. The fundamental domain of A with respect to 71, 72,...,2%n is the set 


P= P(A) := {aya1 + dg%q +++ + Ont: 0< a; < 1}, 


348 


9.8. Lattices and quotients 349 


the interior (and part of the boundary) of one of the diamond-shaped cells in Figure 
9.1. If \ € A, then \+ P gives us another of the diamond shapes, shifted from the 
original by A. Therefore the sets 4+ P,  € A are disjoint and their union is R”. 
Therefore P(A) is a set of representatives of 


R"/A, 


which is often called “R” mod A”. 


Figure 9.1. Constructing a lattice in R?, generated by vectors u and v. The 
shaded grey parallelogram is the fundamental domain P(A). The dots repre- 
sent the same point in R?/A repeated in each copy of P(A); that is, they are 
the points P + A for each vector  € A. 


In the non-trivial example with n = 1, for which A = Z, we can write every 
real number z as m+ a where m € Z and a € [0,1), letting m = [z] and a = {z}. 
We prefer to think of this as z = a in the ring R/Z since their difference, m, is an 
integer. This generalizes to n dimensions, in which case we can identify R"/A with 
(R/Z)”. 

The determinant det(A) of A is the volume of P; in fact det(A) = |det(A)], 
where A is the matrix with column vectors x1, %2,...,2%n (written as vectors in 
R”). A convex body K is a bounded convex open subset] of R”. 


’These are all common terms in geometry. A set S C R” is bounded if it can be contained inside 
a ball of some finite radius. The set S is convex if all the points on the straight line between any two 
points of S also belong to S. The set S is open if there is a ball around any given point of S, perhaps 
of very small radius, that also is contained within S. 


350 Appendix 9A. Proof of the local-global principle for quadratic equations 


If A Cc Z”, then there are det(A) cosets of A in Z”; that is, 
|Z” /A| = det(A). 
In the proof of Theorem [9.1] we work with the lattice 
A:={(r,s)€Z?: r—ks=0 (mod p)} 


(where k? = —1 (mod p)). This lattice is presented there somewhat differently 
from the definition here, but it can easily be seen that A is generated by (k,1) and 
(p,0), and that (0,p) = p(k, 1) —k(p,0). Hence det(A) = p; in particular we deduce 
that there are p distinct cosets of A within Z?. 

Let S be the set constructed in the proof of Theorem [9.]} S is a convex set of 
> p elements of Z? so that the difference, d, of two of them lies on the lattice A. 
The set S was constructed so that the difference, d, must lie close to the origin. 
Moreover A was constructed so that if (r,s) € A, then r? + s? = 0 (mod p) (since 
if r = ks (mod p), then r? + s? = (ks)? + s? = (k? +1)s? =0 (mod p).) 

We will now develop these ideas to give a proof of the local-global principle. In 
the next section we will modify the last step to make it more elegant. 


Proof of the local-global principle. Assume that a, b, and c are squarefree, 
pairwise coprime integers, with a,b > 0 > c (so that there are non-zero real solutions 
to ax? + by? + cz? = 0), and that there exists a solution to 

au? + bv? + cw =0 (mod |abel), 


with (au?, bv”, cw”, abc) = 14] We may assume that at least two of a, b, |c| are > 1, 
for the case a = b = 1 can be proved directly from Theorem [9-2] while the case 
a= 1,c = -—1 is easy as we always have the solution « = b—1l,y=2,z=6+1. 
Define the lattice 
A:={(2,y,z) CZ?: aux+bvy+cwz=0 (mod |abcl)}. 
We claim that if (2, y, z) € A, then 
ax* + by? +cz2 =0 (mod |abcl). 
We now prove that this holds mod a (and the cases mod b and mod |c| proceed 
analogously, so that the claim follows using the Chinese Remainder Theorem). Now 
if (a, y,z) € A, then buy = —cwz (mod a), and so 
bv? - by? = (buy)? = (—cwz)? = cw? - cz” (mod a). 
Dividing through by bv? = —cw? (mod a), we deduce that by? = —cz? (mod a). 
Therefore ax? + by? + cz? =0 (mod a), as desired. 


In the next exercise we will show that |det(A)| = |abc|. Let 


S:={(i,j,k): 0O<i<[y[be]], 0< 7 < [Vac], 0< k < [V|ad]]}. 


The number of integer points in S' is > \/bc| - \/|ac| - \/|ab] = |abe| = |Z3/A], and 
so, by the pigeonhole principle, there must be two lattice points in S that differ by 


4+Lemma [9.3.1 implies that we should work modulo 8|abc| in proving the local-global principle. 
However, in this first version of our proof, we prefer to not worry about the equation modulo powers of 
2. We will revisit this issue in the next section. 


9.9. A better proof of the local-global principle 351 


non-zero element (x,y,z) € A. If the two lattice points are (i,j,k) and (J, J, Kk), 
then 


lel =| 21 < [Vibe], wl =18 - JS [Weel], lal = 1k - KS [Va]. 


These are all “<” as none of |bc|, |ac|, |ab| are squares, since at least two of a, b, |c| are 
> 1 and they are pairwise coprime. Therefore ax? + by? < 2\abc| and |cz?| < |abel, 
so that 

—labc| < ax? + by? + cz? < 2labc|. 
This implies that either ax? + by? + cz? = 0 or az? + by? + cz? = |abc| = —abc. 
We need to eliminate the second case. I know of two ways to do this. The first is 
inelegant and comes from simply noting that if ax? + by? + cz? + abc = 0, then 


a(xzz — by)? + b(ax + yz)? + c(ab+ 27)? = (abt z7) (ax? + by? + cz” + abc) = 0. 


The second involves slightly modifying the definition of A, by taking the prime 2 
into account more carefully, which we discuss in the next section. 


Exercise 9.8.1. (a) Show that there exist integers U, V, W, coprime with abc, for which U = u 
(mod be), V =v (mod ac), W = w (mod ab), so that aU? + bV? + cW? = 0 (mod |abc]). 
(b) Let U-+ be an integer = 1/U (mod abc) and W~! be an integer = 1/W (mod abc). Show 
that A is generated by the vectors (1,VU~!,WU~—‘), (0,c, -bVW~—*), and (0,0, ab).) 
(c) Deduce that det(A) = |abc]. 


9.9. A better proof of the local-global principle 
The idea is to construct a lattice, based on that in the previous section, but now 
of determinant 4|abc|. We begin by defining 
Ao := {(x,y,2) CZ: aux + buy +ewz =0 (mod |abcl)}. 
If c is even, then let 
A:={(a,y,z)€ Ao: y=a (mod 4) and z=waz (mod 2)} 
based on the given solution (u,v,w). We construct A analogously if a or 0 is even. 


If abc is odd, then one of u,v, w must be even (as au? + bv? + cw? = 0), say w. 
If so, then let 


A:={(a,y,z)€ Ao: y=ax (mod 2) andz=0 (mod 2)}, 
using the given solution (u,v,w). We construct A analogously if u or v is even. 
Exercise 9.9.1. (a) Prove that if (2, y,z) € A, then ax? + by? + cz? = 0 (mod 4{abel). 
(b) Prove that det(A) = 4|abc]. 


Consider the set of integer points 


S:={(i,j,k): 0<i< [V2be]], 0< 5 < [Vlad], 0< k < [2V/Jab]]}. 


The number of lattice points in S is > \/2|bc] - \/2|ac] - 2./|ab] = 4|abc| = |Z3/A| 
by exercise [9.9.1(b), and so, by the pigeonhole principle, there must be two lattice 
points in S that differ by a non-zero element (x,y,z) € A. If the two lattice points 
are (i,j,k) and (J, J, K), then 


lz] =|é-2| <[V2lbe]], lvl = ls - F< [V2lael], [2] = |k - K| < [2V/la8]]. 


352 Appendix 9A. Proof of the local-global principle for quadratic equations 


Therefore ax? + by? < 4|abc| and |cz?| < 4|abc| (as equality would only be possible 
if a =b=1), and so 
lax? + by” + cz?| < Alabc. 
Now, since (x,y,z) € A, we know that 
ax” + by* +cz?=0 (mod 4labc]), 


by exercise [9.9.1{a), and so we must have ax? + by? + cz? = 0 as desired. 


A by-product of this proof is that the smallest non-trivial solution satisfies 
|al?|, |bm?|, |en?| < 4labcl. 
In 1950, Holzer showed that one may replace 4|abc| by |abc|. 


Exercise 9.9.2. Give infinitely many examples in which max{|aé?|, |bm?|, |cn?|} = |abc| in the 
smallest non-trivial solution of af? + bm? + cn? = 0. 


Appendix 9B. Reformulation 
of the local-global principle 


9.10. The Hilbert symbol 


Given pairwise coprime integers a,b,c and prime p we define the Hilbert symbol 


b 
(=) , which equals 1, if 
Pp 


for every k > 1, there is a solution u,v,w (mod p*) with (au?, bu?, cw?,p) = 1 to 
au? + bv? + ew? = 0 (mod p*), and which equals —1 otherwise. If p is odd, then 
we need only consider solutions mod p, and if p = 2, then we need only consider 
solutions mod 8, as explained in the proof of Lemma|9.3.1 


We also define 


—1 otherwise. 


(* b, =) _ ‘ if there are real solutions u,v, w 4 0 to au? + bv? + cw? = 0, 
oo 


Theorem 9.6. For any given pairwise coprime, squarefree integers a,b,c we have 


25) (ee) 


p prime 


This Hilbert product theorem has several remarkable consequences, alluded to in 
section The local-global principle implies that if there are no integer solutions 
to au? + bv? + cw? = 0, then there must be a value of @, either oo or a prime p, 


for which (4s = —1. However, Theorem [9.6]implies that the number of such 


is always even. This explains the remarks in the second paragraph of section [9.3} 
If there are real solutions but not integer solutions, then there are an even number 


393 


354 Appendix 9B. Reformulation of the local-global principle 


of primes p for which (24 


= —1, that is, an even number of primes p for which 
there are no solutions mod p, or mod 8 if p= 2. This implies that one can neglect 
to mention one local criterion in formulating the local-global principle (for if there 
are no solutions, there must still remain one such local criterion for which there 
are no solutions). This explains why in Legendre’s formulation of the local global 
principle, one is able to avoid being careful about the role of the prime 2 (which, 
as we will see in proving Theorem [9.6] is often more complicated than the others). 


Proof. The values of the Hilbert symbols remain the same if we rearrange the 
order of a, b, and c, or if we multiply each of a,b,c through by —1. This means 


that we may assume that if any of a, b,c are even, then it is a, and that at most one 
a,b,c 


ee —(-1)", where 7 is the number of a, b,c 


of a,b,c is negative. Therefore ( 
that are < 0. 


Suppose that p is an odd prime. If p { abc, then (#5) = 1 by exercise [8.9.9] 


If odd prime p divides abc, say p divides a, then (222) = (=), and similarly if 
p divides 6 or c. 


The role of the prime p = 2 is complicated. Let A = a/(2,a) so that A, b,c are 
odd. For convenience we define € = 1 if A= b=c (mod 4), and € = —1 otherwise. 


A case-by-case analysis then yields that if a is odd, then (235) = —€; and if a is 


even, then (282) = 1 if b+c=0 or —a (mod 8), and —1 otherwise. One can then 


(85)--(2)- 


shi) = 1 if p does not divide 2abc, we obtain 


verify in every case that 


Next, noting that ( 


ee 


p prime p odd 
pla p\b ple 


=~ (Se). (Gi) ( i i) Gi i ‘ 
re) OG) O00) 


and the result follows. To obtain the second-to-last equality we use that ( <4) a 
(=) ( =1) (= t), the value of (—1/-), and quadratic reciprocity to obtain that 


(S)(G)(E) GG) 0 @E) 


9.11. The Hasse-Minkowski principle 359 


equals (—1) to the power 
A-1 b-1 ec-1 A-1 b-1 b-1 c-1 A-1 cl 


: : 9) 

i a eee 7 19 7 19 5 ened 2) 

al bee el Aa=d-ba2 ga te 0 ifA=b=c (mod 4), 
— 2 2 2 2 2 2 ~ |1-— otherwise, 


which equals €. 


9.11. The Hasse-Minkowski principle 


We have seen that any quadratic polynomial with rational coefficients in no matter 
how many variables may be diagonalized and then scaled up to become a quadratic 
form with integer coefficients; call it f = aja} +---+anx2. The Hasse-Minkowski 
principle is a generalization of the local-global principle and states that any diagonal 
quadratic form with integer coefficients in n > 3 variables has a non-trivial solution 
in integers if and only if it does in the reals and modulo every prime power. Moreover 
the analogy to Theorem[9.6Jholds for forms in arbitrarily many variables. This gives 
rise to the following remarkable result: 


Theorem 9.7. A diagonal quadratic form in five or more variables has a non- 
trivial solution over the integers if and only if its coefficients do not all have the 
same sign. 


The proof for this is given in the following exercise. 


Exercise 9.11.1. Suppose we are given a quadratic form a,x? ie anx2, with each a; € Z. 

(a) By changing variables and multiplying through by a suitable constant, show that we may 
assume each a, is squarefree. 

(b) Prove that if the a; do not all have the same sign, then there is a non-trivial real solution 
to aya? +--+ + ana? ='0; 
Let p be a given prime. 

(c) By possibly multiplying through by p and changing variables, show that we may assume 
that for every prime p, no more than n/2 of the a; are divisible by p. 

(d) Deduce that if nm > 5, then there exist integers m1,..., Mn for which aim? +-: -tanm?2, =0 
(mod p), such that there exists some j for which a;mj; #0 (mod p). 

(e) Prove Theorem [9.7Jusing the Hasse-Minkowski principle. 


This is all discussed in detail in Part | of the wonderful book 


[1] J.-P. Serre, A course in arithmetic, Graduate Texts in Mathematics 7, Springer-Verlag, 1973. 


Appendix 9C. The number 
of representations 


9.12. Distinct representations as sums of two squares 
We modify the proof of Theorem [9.1] to obtain the following: 
Proposition 9.12.1. If n is a squarefree positive integer and m? = —1 (mod n), 


then there exist coprime integers r,s such that n =r? +s? andr =ms (mod n). 


Proof. There are > n integers constructed in the set 


{jtkm: 0< 9,k < [Vn]} 


and so two must be congruent mod n, by the pigeonhole principle. If 7 + km = 
J+Km (mod n), then let r= j—J and s = K—k so that r= ms (mod n) where 
\r|,|s| <4/n, and r and s are not both 0. Now 


r? +s? = (ms)? +s? =s*(m?+1)=0 (mod n) 


and 0 < r?4+ s? < 2n. The only integer between 0 and 2n that is divisible by n is 
n itself, and so we deduce that r? + s? =n. Moreover (r,s) = 1 or else n would be 
divisible by the square of (r, s)?, contrary to the hypothesis. 


Lemma 9.12.1. If n = a? +b? = c* +d? with (a,b) = (c,d) = 1 and ad = bc 
(mod n), then either c= +a and d= +b, or c= ¥b and d= +a. 


Proof. By we have (ac + bd)? + (ad — bc)? = n?. Now n? divides (ad — bc)? 
by the hypothesis, and so divides (ac + bd)?. Therefore if u = (ac + bd)/n and 
v = (ad—bc)/n, then u and v are integers for which u? + v? = 1. This implies that 
either u = 0 or v = 0. Now suppose u = 0, so that ad = bc. Then a divides ad = bc 
and so a divides c, as (a,b) = 1. An analogous argument gives that c divides a, and 
so c = +a and therefore d = +b. Otherwise v = 0 and ac = —bd. The analogous 
argument yields that d= +a and c = #0. 


356 


9.12. Distinct representations as sums of two squares 357 


Let R(n) denote the number of proper representations of n as the sum of two 
squares. 


Theorem 9.8. [fn is squarefree, then 


m= TE (GEG) 


pprime: pin 


Proof. If n = a? + b?, then (a,b) = 1 as n is squarefree. Therefore (b,n) = 1 or 
else (b,n) divides n — b? = a? which would imply that (a,b) > 1. Then m = a/b 
(mod n) is well-defined and satisfies m? = —1 (mod n). Now for each such m there 
exists a solution to n = r? +s? with (r,s) = 1 and r = ms (mod n) by Proposition 
[9.12.1} and there are in fact exactly four such solutions by Lemma [9.12.1] Hence 
R(n) equals four times the number of square roots of —1 (mod n). The result then 
follows by exercise [8.9.8[b) and (c). 


A similar discussion extends the result to all n, to obtain that 
—4 
9.12.1 R(n) =4 —}. 
(9.12.1) im=ay (=) 


We sce that the right-hand side equals 4(1 * (=*))(n), and so 


where L (s, (=)) is the Dirichlet L-function that we encountered in section [8.17] 
of appendix 8D. 


Exercise 9.12.1. Give another proof of exercise using exercise 


Exercise 9.12.2. Prove that R(n)/4 equals the number of divisors of n that are = 1 (mod 4) 
minus the number of divisors of n that are = 3 (mod 4). 


A different perspective. In Z|7] there are four units 1, —1,i, —i. 


If p is a prime = 1 (mod 4), then we can write p = a? + b?, which factors in 
Zt] as (a + bi)(a — bi). We can replace a + bi by x + yi = u(a+ bi) for any unit u. 
Note that if u = 7%, then « = —b,y =a and so «/y = —b/a = a/b (mod p); in other 
words x/y (mod p) does not vary as we range over the units u (which is essentially 
another proof of Lemma[9.12.1] for n = p). So if we write P = a+ bi, then we can 
take p as the norm of x + yi for 2 + yi = uP or uP where u is a unit. Therefore 
there are 8 = R(p) possibilities for x and y. 

If p = 2, then 1-7 = —i(1+7) so we have R(2) = 4. Now if n = p; --- px where 
each p; = 1 (mod 4) is the norm of P;, then the possible representations of n as a 
sum of two squares, «? + y”, correspond to all possible factorizations of x + iy as 


where u is any unit and each Aj = P; or P;, a total of 4-2" possibilities. Moreover 
if n = m?2p{---p;* where each p; = 1 (mod 4), and every prime factor of m 


is = 3 (mod 4), then the possible representations of n as a sum of two squares, 
x? + y”, correspond to all 


€1-71 €2—72 Dek —-Tk 


a+ iy = um(1 +i)? - POP, - Pz? P2 ae ge 


358 Appendix 9C. The number of representations 


where u is any unit and each r; lies in range 0 < rj < ej, a total of 4: Thj1(6; +1) 
possibilities. This yields our formula for R(n) above. 


The average number of representations. If we sum up R(n) over all n < N, 
we are asking for the number of pairs of integers a,b for which a? + b? =n < N, in 
other words the number of integer points (a,b) € Z? inside the circle x? + y* < N. 
We can approach this question by drawing a square of area 1 around each such 
point (whose corners lie at (a+ $,b+ 4). These squares are disjoint other than 
their boundaries, so the area of their union equals the number of pairs a,b. The 
area contained inside the union of the squares is a good approximation to the area 
inside the circle of radius VN, which has area tN. Therefore the average number 
of representations of integers up to N as the sum of two squares is 


ol dete 2. 02,72 
lim WL Rn) = lim wiitta, 6) €Z > a’ +b° < N}, 


N-oo N-0o 
n<N 


which should be about 7. 

We can be more precise in estimating #{(a,b) € Z?: a? +b? < N}. We saw 
above that this equals the area of the union of the unit boxes whose centers lie in 
the circle x? + y? < N. This is close to the area tN of this circle, but there is an 
error in this estimate, which comes from the area of the unit boxes that straddle 
the boundary (that is, that are partly inside and partly outside the circle). Since 
each box has area one, the error is therefore bounded by the number of such boxes. 


aa PSS 


sl eel 


Figure 9.2. A circle approximated by boxes of area one, shading those that 
straddle the boundary. 


9.12. Distinct representations as sums of two squares 359 


To bound the number of such boxes that straddle the boundary, we draw the 
boxes on the boundary and count the boxes by the number of times neighboring 
boxes share a vertical edge, plus the number of times they share a horizontal edge. 
For example to go counterclockwise around the first quadrant of the circle from 
(VN,0) to (0, /N), we begin with a boundary box with center (m,0) where m is 
the nearest integer to VN. As we go counterclockwise around the boundary the 
next box is (m,1), (m— 1,0), or (m— 1,1). If we are at the box with center (i, /) 
with 7 > 0 and j > 0, then the next box is one of the three (¢,7 + 1), (¢— 1,7), or 
(i—1,7+1). In particular, the value of j —7 transforms monotonically from —m to 
m, increasing by either 1 or 2 as we move on to the next box. Therefore the number 
of boundary straddling boxes in the first quadrant is at most 2m +1 < 2V/N +42. 
Adding together similar contributions from all four quadrants we have 


|#{(a,b) €Z?: a? +0? < N}—aN| <8VN +8. 
In Gauss’s circle problem one wishes to improve this error term as much as 
possible. It is conjectured that for any fixed « > 0, one has the bound < N1/4+¢ if 


N is sufficiently large, and it is known that there are arbitrarily large N for which 
the error term is considerably larger than N!/4 (so one cannot take € = 0). 


Appendix 9D. Descent 
and the quadratics 


9.13. Further solutions through linear algebra 


We wish to determine all pairs of positive integers a > b for which ab+ 1 divides 
a? + b?. If we have a solution with a? + 6? = k(ab + 1) for some positive integer k, 
then a is a root of the quadratic polynomial 


x? — kba + (b? — k). 


If c is the other root, then a+ c = kb, so c = kb — a is another integer for which 
b? +c? = k(be + 1). Now c > 0 or else be + 1 < 0, in which case 6? + c? < 0, 
which is impossible. If c = 0, then, looking back at our equations, we see that 
k = b? and a = 6°. Otherwise c > 0 so that b? — k = ac > 0 and therefore 
c=(0'—k)/a<b?/b=b. 

We have proved that 0 < c < b which means that (b,c) is a smaller solution 
than the original solution (a,b), with the same quotient. We can iterate this map, 
(a, b) + (b, kb — a), to eventually descend, after a finite number of steps, to a basic 
solution. We only stop descending when c = 0, which means that k must be a 
square, a fact that is far from obvious in the formulation of the problem. 

To obtain all solutions, we simply invert our map: Writing k = m? for some 
integer m > 1 we begin with the solution (m,0) to m? + 0? = k(m-0+ 1) and 
obtain all others by iterating the map 


(b,c) > (kb —c,b). 
This map is better understood through matrices: We have 
a\ _(kb-c\ _ (k —-1)\ /b 
b}) b ~ Al 0 c}? 


360 


9.15. Apollonian circle packing 361 


the matrix representing a transformation of determinant 1. Therefore, if k = m?, 


then all solutions to a? + b? = k(ab+ 1) in non-negative integers a,b are given by 


a\ _ m2 —1\" (m 
b) 6 \ 1 0 0/}’ 
for some integers m > 1 and n> 0. 


Exercise 9.13.1. Fix integer k > 1. Let x9 = 0, 1 = 1, and tn = ka@n_1 — Xp—2 for all n > 2. 
Prove that all solutions to a? + b? = k(ab+1) in non-negative integers a,b with k = m? 
by a=mz,, b= mzy_1 for some integer n > 1. 


are given 


Exercise 9.13.2. Prove that if A is any 2-by-2 matrix and the vector u,, = A” uo for some given 
matrix uo, then un satisfies a second-order linear recurrence. 


Other quadratic equations can also be understood by recursions. Perhaps the 
most famous is the Markov equation. 


9.14. The Markov equation 


Here we seek all triples of positive integers x,y, z for which 
x? + y? +27 = 3ayz. 


One can find many solutions: (1,1, 1), (1,1, 2), (1, 2,5), (1,5, 13), (2,5, 29),.... If 
(a,b,c) is a solution, then a is a root of the quadratic, x? — 3bexr + (b? + c?), the 
other root being 3bc — a, and so we obtain a new solution (3bc — a,b,c). One 
can perform this same procedure singling out 6 or c instead of a, and get different 
solutions. For example, starting from (1,2,5) one obtains the solutions (29, 2,5), 
(1, 13,5), and (1, 2,1), respectively. 


If we fix one coordinate, say z = c, then we can get a new solution from an old 


one via the map 
a\ _ [3c —1)\ (ao 
b) 6 \1 (0 bo)’ 


and then this map can be repeated arbitrarily often, as in the previous section, to 
obtain infinitely many solutions. 


Despite knowing there are infinitely many, the solutions to the Markov equation 
remain mysterious. For example, one open question is to determine all of the 
integers that appear in a Markov triple. The first few are 


1, 2, 5, 13, 29, 34, 89, 169, 194, 233, 433, 610, 985, 1325,.... 
It is believed that they are quite sparse. 


Exercise 9.14.1.1 Determine what solutions are obtained from (1,1,1) by using the maps (x, y) > 
(3y — x,y) and (x,y) > (x, 3a — y). 


9.15. Apollonian circle packing 


To my taste, the most beautiful such problem is the Apollonian circle packing 
problem) Take three circles that touch each other (for example, take three coins 


5 Apollonius lived in Perga, 262-190 B.C. 


362 Appendix 9D. Descent and the quadratics 


and push them together): 


Figure 9.3. Three mutually tangent circles with a shaded crescent shape in between. 


In between the circles one has a crescent-type shape (a hyperbolic triangle). 


There are two circles that are tangent to each of these three circles: Inside 
that crescent shape one can inscribe a (unique) circle that touches all three of the 
original circles. There is also a unique circle that contains all of the original cycles 
and touches each of them. 


Figure 9.4. Two new circles, each tangent to all three of the old circles. 


What is the relationship between the radii of the new circles and the radii of the 
original circles? Define the curvature of a circle to be C/r where r is the radius, 
for some appropriately selected constant C' > 0. In 1643 Descartes, in a letter to 
Princess Elisabeth of Bohemia, noted that if b, c, and d are the curvatures of the 
original three circles, then the curvatures of the two new circles both satisfy the 
quadratic equation 


2(a? +b? +c? + d*) =(at+b+c+d)’, 


9.15. Apollonian circle packing 363 


that is, 


a? +b? +0? +d? —2(ab+ be + cd+da+ac-+ bd) = 0. 


Therefore given b, c, and d, there are two possibilities for a, the roots of the 
quadratic equation, 


x? —2(b+c+d)a+(b? +c? +d? — 2bc — 2cd — 2bd) = 0. 


We select C so that the first three curvatures, b, c, and d are integers with gcd(b, c, d) 
= 1. We will focus on the case that a is also an integer; for example if we start 
with b = c = 2 and d = 3, we have a? — 14a — 15 = 0 so that a = —1 or a = 15. 
Evidently a = —1 corresponds to the outer circle[4| and a = 15 the inner one. 


If we have one solution (a, b,c, d), then we also have another solution (A, b,c, d) 
for which 


A=2(b+c+d)—a. 


Moreover if a, b, c, and d are integers, then so is A. We can iterate this (using any 
of the variables b, c, or d, in place of a) to obtain infinitely many Apollonian circles. 
These eventually tile the whole of the original circle, as each new circle fills in part 
of the crescent in between three existing circles. 


In this example, we take the three 
most common American coins, a quar- 
ter, a nickel, and a dime, which have 
radii 24, 21, and 18 mms, respectively, 
to the nearest millimeter. In this case 
we define the curvature of a circle of 
radius r mm to be 504/r, yielding cur- 
vatures of 21, 24, and 28, respectively. 
We proceed by filling in each successive 
crescent shape with a mutually tangent 
circle. What emerges is a tiling of the 
whole outer circle (which has curvature 
—11) by circles with larger and larger 
positive integer curvatures] 


There are many questions that can 
be asked: What integers appear as curvatures in a given packing? There are some 
integers that cannot appear because of congruence restrictions. For example if 
a,b,c,d are all odd, then all integers that arise as curvatures in this packing will 
be odd. The conjecture is that all sufficiently large integers that satisfy these con- 
gruence constraints, which can all be described mod 24, will appear as curvatures 
in the given packing. Although this is an open question, we do know that a pos- 
itive proportion of the integers appear in any such packing, and that there are a 


®The negative sign is intriguing. The mathematics gives a negative integer which surely makes no 
sense; but can the mathematics lie? A more in-depth analysis indicates that the negative sign should 
be interpreted as meaning that whereas the interior of a circle usually means all points inside the circle, 
when we have negative curvature the interior is to be interpreted as all points outside the circle, going 
off to oo. It is best to think of the circles as being drawn on a sphere because, for a circle on a sphere, 
the circle partitions the sphere into two parts, and there are two choices as to what is the interior and 
what is the exterior. 

“Tiled circle defined by U.S. coins, reproduced here with the kind permission of Alex Kontorovich. 


364 Appendix 9D. Descent and the quadratics 


surprisingly large number of circles in any packing with curvature < T, far more 
than T (so that many circles in the packing have the same curvature, and thus 
the same radius) 5] Since so many different integers appear as curvatures in any 
given packing, Peter Sarnak asked (and resolved) whether there are infinitely many 
pairs of mutually tangent circles, whose curvatures are both prime numbers, the 
Apollonian twin prime conjecture. 


This last question is accessible because we see that any given solution v = 
(a, b,c, d) is mapped to another solution by any permutation of the four elements, 


-1 2 2 2 
0 10 0 : i 
as well as the matrix 0 01 01° These (linear) transformations generate a 
0 00 1 


subgroup, G, of SL(4, Zz) and one can proceed by considering orbits (that is, the 
set {Av: A € G} for some starting vector v) under the actions of G. 


Sarnak’s approach to studying the curvatures brings us back to quadratic equa- 
tions: In the American coins example, we begin with the circle of curvature 28 that 
is tangent to the circles of curvature 21 and 24. The circle inside the crescent that 
is tangent to these three circles has curvature 157. Next we determine the circle 
that is tangent to those of curvature 21, 24, and 157, and then the next one, always 
using the two circles of curvature 21 and 24. So if x, is the curvature of the nth 
circle in this procedure, then xg = 28, 7; = 157, and 


Enq = 2(21+4+ 244 a,) —ep-1 for alln > 1, 
by Descartes’s equation. We can prove by induction that 
In = 45n? + 84n + 28 for all n > 0; 


so the circles in our circle problem, tangent to the original circles of curvatures 21 
and 24, have curvature z,, for each n > 0. 


Exercise 9.15.1. Suppose that you are given three mutually tangent circles A, B, and Co of 
curvatures a, b, and xp in an Apollonian circle packing. For each n > 0 let C,,+1 be the circle 
tangent to the circles A, B, and C;, that lies in the crescent between these three circles, and let 
%n+1 be its curvature. Prove that 


tn = (a+b)(n? —n) + (#1 — e0)n + 20 for all n > 0. 


Peter Sarnak developed this idea further, which we return to in appendix 12G. 


Further reading on Apollonian packings 


[1] Dana Mackenzie, A tisket, a tasket, an Apollonian gasket, American Scientist 98 (Jan—Feb 2010), 
10-14. 


[2] Peter Sarnak, Integral Apollonian packings, Amer. Math. Monthly 118 (2011), 291-306. 


®The total number of circles in any given packing with curvature < T is about eT“ where a = 
1.30568 ..., and c is a positive constant that depends on the packing. 
°SL(4, Z) is the set of 4-by-4 matrices of determinant 1 with integer entries. 


rr 
Chapter 10 


Square roots and factoring 


In this chapter we will study the computational side of number theory, which plays 
an important role in several uses of computers in today’s society, particularly when 
it comes to keeping secrets. We will investigate how to rapidly determine whether 
a given large integer is prime and, if not, how to factor it. The issue of factoring 
an integer n is closely related to determining square roots mod n: 


10.1. Square roots modulo n 


How difficult is it to find square roots mod n? The first question to ask is how 
many square roots does a square have mod n? 


Lemma 10.1.1. If n is an odd integer with k prime factors and A is a square mod 
n with (A,n) = 1, then there are exactly 2* residues mod n whose square is = A 
(mod n). 


In particular, all squares mod m, that are coprime to m, have the same number of 
square roots mod m. We resolved how many square roots 1 (mod n) has in Lemma 
[3.8.1] and here we modify that proof to better suit the discussion in this chapter. 
We could have immediately deduced Lemma([I0.1.1]for if A is a square mod n, then 
there exists b (mod n) such that b? = A (mod n), and then the solutions to x? = A 
(mod n) are in 1-to-1 correspondence with the solutions to y? = 1 (mod n) through 
the invertible transformation x = by (mod n). 


Proof. Suppose that 6? = A (mod n) where n = pj'p5? ...p;", and each p; is odd 
and distinct. If z? = A (mod n), then n|(x? —b?) = (x — b)(a +6) so that p divides 
x— bor «+6 for each prime p dividing n. Now p cannot divide both or else p 
divides (a + b) — (x — b) = 2b and so 4A = (2b)? = 0 (mod p), which contradicts 
the fact that (p,2A)|(n,2A) = 1. So let 


d=(n,x— 6), and therefore n/d=(n,x2+)), 


365 


366 10. Square roots and factoring 


which must be coprime. Then 2 = ba (mod n) where bg is that unique residue class 
mod n for which 


b (mod d), 
10.1.1 ba = 
( ) : = (mod n/d). 
Note that the bg are well-defined by the Chinese Remainder Theorem, are distinct, 
and that 2? = b? = b? = A (mod n) for each d. 
The possible values of d are [],-; p;* for each subset J of {1,...,k}, and there- 
fore there are 2" possibilities. 


To see how the proof works let’s obtain the four square roots of 4 (mod 15) from 
knowing one square root, 2, and the factorization of 15. These four square roots 
are given by four pairs of congruences which we solve using the Chinese Remainder 
Theorem: 


2 (mod1) and -—2 (mod15) which yield 13 (mod 15); 

2 (mod3) and -—2 (mod5) ~~ which yield 8 (mod 15); 
2 (mod5) and -—2 (mod3) ~~ which yield 7 (mod 15); and 
2 ( 


mod 15) and -—2 (mod1)~ whichyield 2 (mod 15). 


Consequence. Let n be an odd integer with at least two different prime factors, 
and suppose that b? = A (mod n) with (A,n) = 1. Finding square roots of A mod 
n, other than b and —, is “as difficult as” factoring n into two parts both > 1. 


Sketch of “proof”. If we have a factorization n = d-n/d, then we select bg as in 
(10.1.1) so that b2 = A (mod n) but bg 4 +b (mod n), as d,n/d> 1. 


In the other direction, suppose that one has a fast algorithm for rapidly finding 
arbitrary square roots mod n for odd integers n. In particular given A (mod n), 
the algorithm randomly determines some x (mod n) for which x? = A (mod n); 
by “random” we mean that each time the “square root finding” algorithm is run it 
is equally likely to produce any one of the 2” solutions (as in Lemma[I0.1.1). Now 
define d = (n, x — ) (as in the proof of Lemma[I0.L.J) and so we factor n as d-n/d. 
This works provided d 4 1 or n, that is, provided that « 4 b or —b (mod n). 

Now, the probability that 2 = 6 or —b (mod n) is 2/2" which is < } as k > 2. 
Therefore the probability of finding a non-trivial factor of n each time the “square 

1 


root finding” algorithm is run is > 5. This does not seem persuasive, but if we 
g alg 2 


run the “square root finding” algorithm 20 times, then the probability that the 


: : : 20 a : tas 
algorithm gives 1 or n on every run is < (5) , which is less than one in a million. 


So, in practice, we will quickly find a non-trivial factor of n. 


We have shown that finding square roots mod n and factoring n are more or 
less equally difficult problems. 


Exercise 10.1.1. Find all of the square roots of 49 mod 32-5- 11. 
10.2. Cryptosystems 


Cryptography has been around for as long as the need to communicate secrets at a 
distance. Julius Caesar, on campaign, communicated military messages by creating 


10.2. Cryptosystems 367 


ciphertext from plaintext (the unencrypted message), replacing each letter of the 
plaintext with that letter which is three letters further on in the alphabet. Thus A 
becomes D, B becomes E, etc. For example, 


thisisveryinteresting 
becomes 
wklilvilvyhublqwhuhvwilqj 


(Y became B, since we wrap around to the beginning of the alphabet. It is es- 
sentially the map x > x + 3 (mod 26).) At first sight an enemy might regard 
WKLV...WLQJ as gibberish even if the message was intercepted. It is easy 
enough to decrypt the ciphertext, simply by going back three places in the alpha- 
bet for each letter, to reconstruct the original message. The enemy could easily 
do this if (s)he guessed that the key is to rotate the letters by three places in the 
alphabet, or even if they only guessed that one rotates by a fixed number of let- 
ters, as there would only be 25 possibilities to try. So in classical cryptography it 
is essential to keep the key secret, as well as the technique by which the key was 
created || 


One can generalize to arbitrary substitution ciphers where one replaces the 
alphabet by some permutation of the alphabet. There are 26! permutations of our 
alphabet, which is around 4 x 107° possibilities, enough one might think to be safe. 
And it would be if the enemy went through each possibility, one at a time. However 
the clever cryptographer will look for patterns in the ciphertext. In the above short 
ciphertext we see that DL appears four times among the 21 letters, and H, V, W three 
times each, so it is likely that these letters each represent one of A, E,1,S,T. By 
looking for multiword combinations (like the ciphertext for THE) one can quickly 
break any ciphertext of around one hundred letters. 


To combat this, armies in the First World War used longer cryptographic keys, 
rather than of length 1. That is, they would take a word like ABILITY and since 
A is letter 1 in the alphabet, B is letter 2, and ILITY are letters 9,12,9,20,25, re- 
spectively, they would rotate on through the alphabet by 1, 2,9, 12,9, —6, —1 letters 
to encrypt the first seven letters, and then repeat this process on the next seven. 
For example, we begin with the message, adding the word “ability” as often as is 
needed: 


weneedtomakeanexample 
plus 
abilityabilityability 
becomes 
xgwqnxspojwnumfzjyyfd 


This can again be “broken” by statistical analysis, though the longer the key length, 
the harder it is to do. Of course using a long key on a battlefield would be difficult, 
so one needed to compromise between security and practicality. A one-time pad, 


1 Steganography, hiding secrets in plain view, is another method for communicating secrets at a 
distance. In 499 B.C., Histiaeus shaved the head of his most trusted slave, tattooed a message on his 
bald head, and then sent the slave to Aristagoras, once the slave’s hair had grown back. Aristagoras then 
shaved the slave’s head again to recover the secret message telling him to revolt against the Persians. In 
more recent times, cold war spies reportedly used “microdots” to transmit information, and Al-Qaeda 
supposedly notified its terrorist cells via messages hidden in images on certain webpages. 


368 10. Square roots and factoring 


where one uses such a long key that one never repeats a pattern, is unbreakable by 
statistical analysis. This might have been used by spies during the cold war and 
was perhaps based on the letters in an easily obtained book, so that the spy would 
not have to possess any obviously incriminating evidence. 


During the Second World War the Germans came up with an extraordinary 
substitution cypher that involved changing several settings on a specially built 
typewriter (an Enigma machine). The number of possibilities was so large that the 
Germans remained confident that it could not be broken, and they even changed 
the settings every day so as to ensure that it would be extremely difficult. The Poles 
managed to obtain an early Enigma machine and their mathematicians determined 
how it worked. They shared their findings with the Allies so that after a great 
amount of effort the Allies were able to break German codes quickly enough to 
be useful, even vital, to their planning and strategy[| Early successes led to the 
Germans becoming more cautious, and thence to horrific decisions having to be 
made by the Allied leaders to safeguard this most precious secret] 


The Allied cryptographers would cut down the number of possibilities (for 
the settings on the Enigma machine) to a few million, and then their challenge 
became to build a machine to try out many possibilities very rapidly. Up until then 
one would have to change, by hand, external settings on the machine to try each 
possibility; it became a goal to create a machine in which one could change what 
it was doing, internally, by what became known as a program, and this stimulated, 
in part, the creation of the first modern computers. 


Exercise 10.2.1. One can also create a cryptosystem using binary addition. For example, our 
key could be the 20-letter word k = 10111011101111011001. Then we could encrypt by using 
bit-by-bit addition; that is, OBO =1@Q1=0 and0@1=1@0=1. Therefore if the plaintext 
is p = 11100010101101000011, then c = p@k, namely 
10111 01110 11110 11001 
@ 11100 01010 11010 00011 
= 01011 00100 00100 11010. 


It is easy to recover the plaintext since p = c@k. Prove that one can recover the key if one knows 
the ciphertext and the plaintext. 


10.3. RSA 


In the theory of cryptography we always have two (imaginary) people, Alice and 
Bob, attempting to share a secret over an open communication channel, and the 
evil Oscar listening in, attempting to figure out what the message says. We will 
begin by describing a private key scheme for exchanging secrets based on the ideas 
in our number theory course: 


Suppose that prime p is given and integers d and e such that de = 1 (mod p—1). 
Alice knows p and e but not d, whereas Bob knows p and d but not e. The numbers 


? As portrayed, rather inaccurately, in the film The Imitation Game. 

°>The ability to crack the Enigma code allowed the Allied leaders to save lives. However if they 
used it so often that every possible life was saved, the Germans would have realized that the Allies 
had broken the code, and then the Germans were liable to have moved on to a different cryptographic 
method, which perhaps the Allied codebreakers might have been unable to decipher. Hence the Allied 
leadership was forced to use its knowledge sparingly so that it would be available in the militarily most 
advantageous situations. As a consequence, they knowingly sent many sailors to their doom, knowing 
where the U-boats were waiting in ambush, but being forced not to disclose that information. 


10.3. RSA 369 


d and e are kept secret by whoever knows them. Thus if Alice’s secret message is 
ui she encrypts M by computing x = M® (mod p). She sends the ciphertext «x 
over the open channel. Then Bob decrypts by raising x to the dth power mod p, 
since 
a? = (M*)*= M*®=M (mod p) 

as de = 1 (mod p—1). As far as we know, Oscar will discover little by intercepting 
the encrypted messages x, even if he intercepts many different x, and even if he 
can occasionally make an astute guess at MM. However, if Oscar is able to steal the 
values of p and e from Alice, he will be able to determine d, since d is the inverse of 
e mod p—1, and this can be determined by the Euclidean algorithm, as discussed in 
exercise [3.5.5] (see the second proof of Corollary B.5.2). He is then able to decipher 
Alice’s future secret messages, in the same way as Bob does. 


This is the problem with most classical cryptosystems; once one knows the 
encryption method it is not difficult to determine the decoding method. In 1975 
Diffie and Hellman proposed a sensational idea: Can one find a cryptographic 
scheme in which the encryption method gives no help in determining a decryption 
method? If one could, one would then have a public key cryptographic scheme, 
which is exactly what is needed in our age of electronic information, in particular 
allowing people to use passwords in public places (for instance when using an ATM 
without fear any lurking Oscar will be able to figure out how to impersonate them|" 


In 1977 Rivest, Shamir, and Adleman (RSA) realized this ambition, via a 
minor variation of the above private key cryptosystem{] Now let p # q be two 
large primed] and n = pq. Select integers d and e such that de = 1 (mod ¢(pq)). 
Alice knows pq and e but not d, while Bob knows pq and d. Thus if Alice’s secret 
message is M, the ciphertext is x = M* (mod pq), and Bob decrypts this by taking 
x? = (M°)4= M“= M (mod pg) as de = 1 (mod ¢(pq)) using Euler’s Theorem. 

Now, if Oscar steals the values of pq and e from Alice, will he be able to 
determine d, the inverse of e mod ¢(pq) = (p— 1)(q—1)? When the modulus was 
the prime p, Oscar had no difficulty in determining ¢(p) = p— 1. Now that the 
modulus is pg, can Oscar easily determine (p — 1)(q — 1)? If so, then, since he 
already knows pq, he would be able to determine pg + 1—(p—1)(q—1) = p+q and 
hence p and q, since they are the roots of x7 — (p+q)x+pq =0. In practice, Oscar 
needs to only know d to factor n (see exercise 5.27 in [(CPO5]§}). In other words, if 
Oscar can “break” the RSA algorithm, then he can factor n = pq, and vice versa. 


We have just shown that breaking RSA is more or less as difficult as factoring. 
Therefore RSA is a secure cryptographic protocol (when correctly implemented) 
if and only if n is a difficult integer to factor. But nobody truly knows whether 


*Of course a message is usually in words, but one converts the letters to numbers using some simple 
substitutions, like “01” for “A”, “02” for “B”,... , “26” for “Z”, etc., and concatenates these numbers. 
Thus “cabbie” becomes “030102020905”. It is this number that is our message that we denote by M. 

5When Alice uses a password, a cryptographic protocol might append a timestamp to ensure that 
the encrypted password (plus timestamp) is different with each use, and so Bob will get suspicious if 
the same timestamp is used again later. 

°TIt is now known that (Sir) Clifford Cocks, working for the British secret cryptography agency, 
GCHQ, had discovered this RSA algorithm in 1974, and it had been classified “Top Secret”. See 
https: //www.wired.com/1999/04/crypto/ for the story. 

“We will develop fast methods to find large primes in appendix 10C. 

®This uses Pollard’s p—1 method, which will not be discussed in this book, and is an algorithm 
that runs in probabilistic polynomial time. 


370 10. Square roots and factoring 


factoring is a difficult problem, nor how to select integers that are provably hard to 
factor. In our current state of knowledge, we do not know any very efficient ways 
to factor arbitrary large numbers, but that does not necessarily mean that there is 
no quick way to do so|?| So why do we put our faith (and secrets and fortunes) in 
the difficulty of factoring? The security of a cryptographic protocol must evidently 
be based on the difficulty of resolving some mathematical problem but we do 
not know how to_prove that any particular mathematical problem is necessarily 
difficult to solve] However the problem of factoring efficiently has been studied 
by many of the greatest minds in history, from Gauss onwards, who have looked 
for an efficient factoring algorithm and failed. Is this a good basis to have faith in 
RSA? Probably not, but we have no better. (More on this at the end of section 
of appendix 10F.) 


Exercise 10.3.1. Let n = 11x53 be an RSA modulus with encryption exponent e = 7. Determine 
d, the decryption exponent, by hand, using the Euclidean algorithm and the Chinese Remainder 
Theorem. 


Exercise 10.3.2. Let n = 5891 be an RSA modulus with encryption exponent e = 29 and 
decryption exponent d = 197. Use this information to factor n. 


10.4. Certificates and the complexity classes P and NP 


Algorithms are typically designed to work on any of an arbitrarily large class of 
examples, and one wishes them to work as fast as possible. If the example is input in 
é characters, and the function calculated is genuinely a function of all the characters 
of the input, then one cannot hope to compute the answer any quicker than the 
length, @, of the input. A polynomial time algorithm is one in which the answer 
is computed in no more than cl“ steps, for some constants c,A > 0, no matter 
what the input. These are considered to be quick algorithms. There are many 
simple problems that can be answered in polynomial time (the set of such problems 
is denoted by P and was already discussed in section [7.14] of appendix 7A); see 
section of appendix 10F for more details. In modern number theory, because 
of the intrinsic interest as well as because of the applications to cryptography, we 
are particularly interested in the running times of factoring and primality testing 
algorithms. 


At the 1903 meeting of the American Mathematical Society, F. N. Cole came 
to the blackboard and, without saying a word, wrote down 


2°" — 1 = 147573952589676412927 = 193707721 x 761838257287, 


long-multiplying the numbers out on the right side of the equation to prove that he 
was indeed correct. Afterwards he said that figuring this out had taken him “three 
years of Sundays”. The moral of this tale is that although it took Cole a great deal 


°There are some families of numbers that we know are easy to factor (for example, see exercise 
[10.7.2] for a fast factoring method if p and q are close together) so we need to avoid those when selecting 
a modulus for RSA. 

10 Here we are talking about cryptographic protocols on computers as we know them today. There 
is a highly active quest to create quantum computers, on which cryptographic protocols are based on a 
very different set of ideas. 

11We can prove that almost all mathematical problems are “difficult to solve” (see section 
of appendix 10F), but we do not know how to identify one specific problem that is provably difficult to 
solve. This is a notoriously challenging and important open problem. 


10.4. Certificates and the complexity classes P and NP 371 


of work and perseverance to find these factors, it did not take him long to justify 
his result to a room full of mathematicians (and, indeed, to give a proof that he 
was correct). Thus we see that one can provide a short proof, even if finding that 
proof takes a long time. 


In general one can exhibit factors of a given integer n to give a short proof that 
n is composite. Such proofs, which can be checked in polynomial time, are called 
certificates. (The set of problems for which the answer can be checked in polynomial 
time is denoted by NP.) Note that it is not necessary to exhibit factors to give a 
short proof that a number is composite. Indeed, we already saw in the converse to 
Fermat’s Little Theorem, Corollary [7.2.1] that one can exhibit an integer a coprime 
to n for which n does not divide a”—!—1 to provide a certificate that n is composite. 


What about primality testing? If someone gives you an integer and asserts 
that it is prime, can you quickly check that this is so? Can they give you better 
evidence than their say-so that it is a prime number? Can they provide some sort 
of certificate that gives you all the information you need to quickly verify that the 
number is indeed a prime? We had hoped (see section [7.6) that we could use the 
converse of Fermat’s Little Theorem to establish a quick primality test, but we 
saw that Carmichael numbers seem to stop that idea from reaching fruition. Here 
we are asking for less, for a short certificate for a proof of primality. It is not 
obvious how to construct such a certificate, certainly not so obvious as with the 
factoring problem. It turns out that some old remarks of Lucas from the 1870s can 
be modified for this purpose. We begin with a sure-fire primality test, obtained as 
a consequence of Proposition [7.5.1 


Corollary 10.4.1. Suppose that n > 1 is a positive integer for which there exists 
an integer g with (g,n) =1 such that g’—! =1 (mod n) and g—/4 #1 (mod n) 
for every prime q dividing n—1. Then n is a prime. 


Proof. Proposition [7.5.l)implies that g has order n—1 (mod n), so that the n—1 
reduced residues 1,9,...,g"~! are all distinct mod n. Therefore every integer a in 
the range 1 < a < n—1 is coprime to n, implying that n is prime. 


We are not suggesting that Corollary [10.4.1] provides a fast primality test. One 
can probably find g rapidly, if it exists, using Gauss’s algorithm which is discussed 
in section [7.15] of appendix 7B. However the algorithm requires one to completely 
factor n — 1, and we have no particularly fast factoring algorithms. On the other 
hand, if nm — 1 has already been factored, then one can proceed rapidly. Indeed 
we can provide a “certificate” to allow a checker to quickly verify that n is prime, 
which would consist of 

g and {q prime : g divides n — 1}. 
The checker would need to verify that g’~! = 1 (mod n) whereas g("—)/4 # 1 
(mod n) for all primes g dividing n—1, something that can be quickly accomplished 
using fast exponentiation (as explained in section [7.13] of appendix 7A). 

There is a problem though: One needs (the additional) certification that each 
such q is prime. The solution is to iterate the above algorithm; and one can show 


that no more than logn odd primes need to be certified prime in the process of 
proving that n is prime. Thus we have a “short” certificate that n is prime. 


372 10. Square roots and factoring 


At first one might hope that this also provides a quick way to test whether a 
given integer n is prime. However there are several obstacles. The most important 
is that we need to factor n — 1 in creating the certificate. When one is handed 
the certificate, n — 1 is already factored, so that is not an obstacle to the use of 
the certificate; however it is a fundamental impediment to the rapid creation of the 
certificate (and therefore to using this as a primality test). 


Exercise 10.4.1. Assuming only that 2 is prime, provide a certificate that proves that 107 is 
prime. 


Exercise 10.4.2. Let Fm, = 22” +1 with m > 2 be a Fermat number. 
Em -1 
(a) Prove that if there exists an integer g for which gq’ 2 = —1 (mod Fn), then Fm is prime. 
(b) Deduce an “if and only if” condition for the primality of Fm using exercise[8.5.4 


10.5. Polynomial time primality testing 


Although the converse to Fermat’s Little Theorem does not provide a polynomial 
time primality test, one can further develop this idea. For example, we know that 
at =-lorl (mod p) by Euler’s criterion, and hence if a"t # +1 (mod n), 
then n is composite. This identifies even more composite n than Corollary [7.2.1 
alone, but not necessarily all n. We develop this idea further in section of 
appendix 10A to find a criterion of this type that is satisfied by all primes but not 
by any composites. However we are unable to prove that this is indeed a polynomial 
time primality test without making certain assumptions that are, as yet, unproved. 


There have indeed been many ideas for establishing a primality test which 
is provably polynomial time, but this was not achieved until 2002. This was of 
particular interest since the proof was given by a professor, Manindra Agrawal, and 
two undergraduate students, Kayal and Saxena, working together with Agrawal 
on a summer research project. Their algorithm is based on the following elegant 
characterization of prime numbers. 


Theorem 10.1 (Agrawal, Kayal, and Saxena (AKS)). For given integer n > 2, let 
r be a positive integer <n, for which n has order > 9(logn)* modulo r. Then n is 
prime if and only if 


e nis not a perfect power, 
e n does not have any prime factor <r, 


e (x+a)” =z" +a mod (n,2" —1) for each integer a,1 <a < 3y/7 logn. 


The last equation uses “modular arithmetic” in a way that is new to us, but 
analogous to what we have seen: (a + a)” = a2" +a mod (n,x2” — 1) means that 
there exist f(x), g(x) € Z[x] such that (a+ a)” — (a" +a) =nf (a) + (a@” — 1)g(z). 

At first sight this might seem to be a rather complicated characterization of the 
prime numbers. However this fits naturally into the historical progression of ideas 
in this subject (indeed, see appendix 10G for a discussion and a proof), is not so 
complicated (compared to some other ideas in use), and has the great advantage 
that it is straightforward to develop into a fast algorithm for proving the primality 
of large primes. However, although the AKS algorithm satisfies the desire to have a 
rigorously proved polynomial time primality testing algorithm, it is not in practice 


10.6. Factoring methods 373 


the fastest algorithm for establishing primality of the largest integers currently 
being considered[*9 


Exercise 10.5.1. Let p* be the highest power of prime p that divides n, with k > 1. 
(a) Prove that p* does not divide (p): 
(b) Deduce that n does not divide (})- 

(c) Show that if n is composite, then n does not divide all the coefficients of the polynomial 


(l+2)"—2"—-1. 


Exercise 10.5.2. Use the previous exercise to show: 
(a) n is prime if and only if (2 +1)” = 2" +41 (mod n). 
(b) If (n,a) = 1, then n is prime if and only if (x +a)" =a" +a (mod n). 
(c) Prove that if n is prime, then (x + a)” = x” +a (mod (n,2" — 1)) for any integer a with 
(a,n) =1 and any r>1. 


10.6. Factoring methods 


The problem of distinguishing prime numbers from composite numbers and 
of resolving the latter into their prime factors is known to be one of the most 
important and useful in arithmetic. It has engaged the industry and wisdom of 
ancient and modern geometers to such an extent that it would be superfluous to 
discuss the problem at length. Nevertheless we must confess that all methods 
that have been proposed thus far are either restricted to very special cases 
or are so laborious and difficult that even for numbers that do not exceed 
the limits of tables constructed by estimable workers, they try the patience 
of even the practiced calculator. And these methods do not apply at all to 
Jarger numbers .... It frequently happens that the trained calculator will 
be sufficiently rewarded by reducing large numbers to their factors so that it 
will compensate for the time spent. Further, the dignity of the science itself 
seems to require that every possible means be explored for the solution of a 
problem so elegant and so celebrated .... It is in the nature of the problem 
that any method will become more complicated as the numbers get larger. 
Nevertheless, in the following methods the difficulties increase rather slowly 
The techniques that were previously known would require intolerable 

labor even for the most indefatigable calculator. 
— from article 329 of Disquisitiones Arithmeticae (1801) by C. F. GAuss 


The first factoring method, other than trial division, was given by Fermat: His 
goal was to write a given odd integer n as x? — y”, so that n = (x — y)(x + y). He 
started with m, the smallest integer > \/n, and then looked to see if m? — n is a 
square. If so, say m? —n =r?, thenn =(m—r)(m+r). 

It is not easy to determine (at least by hand) whether a large integer is a square, 
though most are not. Fermat simplified his algorithm by quickly eliminating non- 
squares, by testing whether m? — n is a square modulo various small primes. If 
m? — n is not a square, then he tested whether (m+ 1)? — n is a square; if that 
failed, whether (m + 2)? — n is a square, or (m+ 3)? —n,..., etc. Since Fermat 
computed by hand he also noted the trick that 


(m+1?—-—n= m —n+(2m+1), 
(m+2)? —n=(m+1)? —n+ (2m +3), etc., 


12Because other algorithms that we believe, but cannot prove, are polynomial time, run faster. 


374 10. Square roots and factoring 


so that, at each step he only needed to add a relatively small number to the integer 
he had just tested, and the next add-on is just two larger than the previous one. 


For example, Fermat factored n = 2027651281 so that m = 45030. Then 
450307 — n = 49619 which is not a square mod 100; 
450317 —n = 49619 + 90061 = 139680 which is divisible by 2°, not 2°; 
450327 — n = 139680 + 90063 = 229743 which is divisible by 3°, not 3°; 
45033? — n = 229743 + 90065 = 319808 which is not a square mod 3; etc. 


up until 45041? — n = 10202, so that 
n = 2027651281 = 45041?—1020? = (45041-1020) x (45041+1020) = 44021 x 46061. 


Exercise 10.6.1. Factor 1649 using Fermat’s method. 


Gauss and other authors further developed Fermat’s ideas, most importantly 
realizing that if 2? = y? (mod n) with « # +y (mod n) and (x,n) = 1, then 


ged(n, rt — y) ‘ gcd(n, L+ y) 
gives a non-trivial factorization of n. 


The issue now becomes to rapidly determine two residues x and y (mod n) with 
x #y or —y (mod n), such that x? = y? (mod n). Several factoring algorithms 
work by generating a sequence of integers a,,a2,..., with each 


a; = b? (mod n) but a; 4b? 


for some known integer b;, until some subsequence of the a;’s has product equal to 
a square, say 

y= Ai, °° * Ai,- 
Then one sets 7? = (bj,---b;,)? to obtain x? = y? (mod n), and there is a good 
chance that gcd(n, x — y) is a non-trivial factor of n. 


We want to generate the a,;’s so that it is not so difficult to find a subsequence 
whose product is a square; to do so, we need to be able to factor the a;. This 
is most easily done by only keeping those a; that have all of their prime factors 
< B, for some appropriately chosen bound B. Suppose that the primes up to B 

Qi,1l, Gi,2 Qi,k . 
ALG Pi, Pa,++.y De» If a; = py po? +++ p,”, then let vy; = (a21,d20,..+, 0,4), which 
is a vector with entries in Z. 


Exercise 10.6.2. Show that [],-; ai is a square if and only if });-; vi = (0,0,...,0) (mod 2). 


Hence to find a non-trivial subset of the a; whose product is a square, we simply 
need to find a non-trivial linear dependency mod 2 amongst the vectors v;. This is 
easily achieved through the methods of linear algebra and guaranteed to exist once 
we have generated more than k such integers a;. 


The quadratic sieve factoring algorithm selects the b; so that it is easy to find 
the small prime factors of the a;, using Corollary [2.3.1] There are other algorithms 
that attempt to select the 0; so that the a; are small and therefore more likely to 
have small prime factors. We discuss some of these in appendix 10B. The best 


Questions on factoring and primality testing 375 


algorithm, the number field sieve, is an analogy to the quadratic sieve algorithm 
over number fields. 


There are many other cryptographic protocols based on ideas from number 
theory. Some of these will be discussed in the appendices to this chapter. 


References: See and [Knu98], as well as: 


[1] Carl Pomerance, A tale of two sieves, Notices Amer. Math. Soc. 43 (1996), 1473-1485. 
[2] John D. Dixon Factorization and primality tests, Amer. Math. Monthly 91 (1984), 333-352. 


Additional exercises 


Exercise 10.7.1. Suppose that n is an odd composite integer. Prove that for at least half the 
pairs x,y with 0 < a2,y <n and a2? = y? (mod n), we have 1 < gced(x — y,n) <n. 


Exercise 10.7.2. Factor n = 62749. Let m = [/n] +1 = 251. Compute (m + i)? (mod n) 
for 7 = 0,1,2,... and retain those residues whose prime factors are all < 11. Therefore we have 
251? = 27. 32.7; 253? = 2?. 3-5-7; 2577 = 2?-3-5?-11; 260? = 3777-11; 268? = 
3-5?-11?; 271? = 22-35-11 (mod n). Use this information to factor n. 


Exercise 10.7.3. Alice is sending Bob messages using RSA with public key modulus n = 
2027651281 and encryption exponent e = 66308903. Oscar recalls that n is the number Fermat 
factored in section |10.6] Find the decryption exponent for Oscar. 


We wish to determine how many different odd primes are involved in the Lucas 
certificate of section [10.4] 


Exercise 10.7.4. Let n be prime and suppose qi,...,q, are the odd prime factors of n — 1. 
(a) Prove that the product of these primes, Ni := q1--- dz, is < n/2. 
(b)t To certify that q1,...,q, are prime we need the set of odd prime factors of qi —1,...,q,—1. 
Let’s call those primes pi,...,p¢. Prove that the product of these primes, N2 := pi--- pe, 
is < Ny /2*. 


(c) Generalize this argument to show that if there are r primes to be certified at the jth stage, 
then Nj41 < N;/2". 

(d)? Prove that if there are m primes that were certified to be prime during all the steps of this 
argument, then 27’ <n. Explain why this implies that primality testing is in NP. 


Exercise 10.7.5.1 Suppose n is an odd composite, and a("—))/2 = 1 or —1 (mod n) for every a 
with (a,n) = 1. Deduce that a(—)/?2 = 1 (mod n) for every a with (a,n) = 1 and that n is a 
Carmichael number. 


Appendix 10A. Pseudoprime 
tests using square roots of | 


In section [7.6] we noted that the converse to Fermat’s Little Theorem may be used 
to give a quick proof that a given integer n is composite: One simply finds an integer 
a, not divisible by n, for which a”~! 41 (mod n) (if this fails, that is, if a’~! =1 
(mod n) and n is composite, then n is called a base-a pseudoprime). Such a search 
often works quickly, especially for randomly chosen values of n, but can fail if the 
tested n have some special structure. For example, it always fails for Carmichael 
numbers, which have the property that n is a base-a pseudoprime for every a with 
(a,n) = 1. What can we do in these cases? Can we construct a test, based on 
similar ideas, that is guaranteed to recognize even these composite numbers? 


10.8. The difficulty of finding all square roots of 1 


Lemma[I0.1.iJjimplies that there are at least four distinct square roots of 1 (mod n), 
for any odd n which is divisible by at least two distinct primes. This suggests that 
we might try to prove that a given base-a pseudoprime n is composite by finding a 
square root of 1 (mod n) which is neither 1 nor —1. (If we can find such a square 
root of 1 (mod n), then we can partially factor n, as discussed in section[I0.1]) The 
issue then becomes: How do we efficiently search for a square root of 1? 


This is not difficult: Since n is a base-a pseudoprime, we have 


(a) =a" '=1 (mod n), 


and so a"=~ (mod n) is a square root of 1 (mod n). By Euler’s criterion we know 
that if p is prime, then a°2 = (a/p) (mod p), so that a*= =1or-1 (mod p). Ifn 
is a base-a pseudoprime (and therefore composite), it is feasible that gas (a/n) 


(mod n), which would imply that n is composite. If a (mod n) is neither 1 nor 


376 


10.8. The difficulty of finding all square roots of 1 377 


—1, this allows us to factor n into two parts, since 

n= ecd(a" = —1,n) - gcd(a"> +1,n). 
If n is composite and a> = (a/n) (mod n), then we call n a base-a Euler pseu- 
doprime. 

For example, 1105 is a Carmichael number, and so 2'!°4 = 1 (mod 1105). We 
take the square root, and determine that 2°°? = 1 (mod 1105). So this method fails 
to prove that 1105 is composite, since 1105 is a base-2 Euler pseudoprime. But, 
wait a minute, 552 is even, so we can take the square root again, and a calculation 
reveals that 222° = 781 (mod 1105). That is, 781 is a square root of 1 mod 1105, 
which proves that 1105 is composite. Moreover, since gcd(781 — 1, 1105) = 65 and 
gcd(781 + 1, 1105) = 17, we can even factor 1105 as 65 x 1725] 

This property is even more striking mod 1729. In this case 1728 = 2°- 27 so we 
can take square roots many times. Indeed, taking successive square roots of 2!728 
we determine that 
1 = 21728 = 9864 — 9482 = 9716 = (mod 1729), but then 24% = 1065 (mod 1729). 
This proves that 1729 is composite, and even that 


1729 = gcd(1064, 1729) x ged(1066, 1729) = 133 x 13. 


This protocol of taking successive square roots can fail to identify that our 
given pseudoprime is indeed composite; for example, we cannot use 103 to prove 
that either 561 or 1729 is composite, since 


103° =1 (mod 561), and so 10379 =---=103°°°=1 (mod 561), 
10327 =—1 (mod 1729), and so 103°4=---= 10318 =1 (mod 1729), 


but such failures are rare (see exercise [10.8.7). 


Suppose that n is a composite integer with n—1 = 2*m for some integer k > 1 
with m odd. We call n a base-a strong pseudoprime if the sequence of residues 


(10.8.1) a” (mod n), qe (mod n),..., g(r—1)/2 (mod n) 
is equal to either 
Le dicey d or Ly dyesg dy SLX ees g* 
where the *’s stand for any residue mod n. These are the only two possibilities if 


n is prime, and so if the sequence of residues in (L0.8.1)) looks like one of these two 
possibilities, then this information does not allow us to deduce that n is composite. 


On the other hand, if n is a not a base-a strong pseudoprime, then we say that 
a is a witness (to n being composite). To be more precise: 


Definition. Suppose that n is a composite odd integer and n— 1 = 2m for some 
integer k > 1 withm odd. Assume that n is a base-a pseudoprime; that is, anl= 
(mod n). If a” =1 (mod n) or a = —1 (mod n) for some integer j > 0, then 
n is a base-a strong pseudoprime. Otherwise a is a witness (to the compositeness 
of n) and if @ is the largest integer for which qr! # —1 or 1 (mod n), then 
gcd(a"2" — 1,n) is a non-trivial factor of n. 


'3We have not factored 1105 into prime factors (since 65 factors further as 65 = 5 x 13), but rather 
into two non-trivial factors. 


378 Appendix 10A. Pseudoprime tests using square roots of 1 


One can compute high powers modulo n very rapidly using “fast exponenti- 
ation” (a technique we discussed in section [7.13] of appendix 7A), so this strong 
pseudoprime test can be done quickly and easily. 


In exercise|L0.8.7|we will show that at least three-quarters of the integers a, 1 < 
a <n, with (a,n) = 1 are witnesses for n, for each odd composite n > 9. So can 
we find a witness quickly if n is composite? 


e The most obvious idea is to try a = 2,3,4,... consecutively until we find a 
witness. It is believed that there is a witness < 2(logn)?, but we cannot prove this 
(though we_can deduce this from a famous conjecture, the Generalized Riemann 
Hypothesi¢!4). 


e Pick integers a1,a2,...,a¢,... from {1,2,3,...,n — 1} at random until we 
find a witness. By what we wrote above, if n is composite, then the probability that 
none of aj, d2,...,a¢ are witnesses for n is < 17, Thus with a hundred or so such 
tests we get a probability that is so small that it is inconceivable that it could occur 
in practice; so we believe that any integer n for which none of a hundred randomly 
chosen a’s is a witness is prime. We call such n “industrial strength primes” since 
they have not been proven to be prime, but there is an enormous weight of evidence 
that they are not composite. 


This test is a random polynomial time test for compositeness (like our test for 
finding a quadratic non-residue given at the end of appendix 8B). If n is composite, 
then the randomized witness test is almost certain to provide a short proof of n’s 
compositeness in 100 runs of the test. On the other hand, if 100 runs of the test 
do not produce a witness, then we can be almost certain that n is prime, but we 
cannot be absolutely certain since no proof is provided, and therefore we have an 
industrial strength prime. 


In practice the witness test accomplishes Gauss’s dream of quickly distinguish- 
ing between primes and composites, for either we will quickly get a witness to n 
being composite or, if not, we can be almost certain that our industrial strength 
prime is indeed prime. Although this solves the problem in practice, we cannot 
be absolutely certain that we have distinguished correctly when we claim that n is 
prime since we have no proof, and mathematicians like proof. Indeed if you claim 
that industrial strength primes are prime, without proof, then a cynic might not 
believe that your randomly chosen a are so random or that you are unlucky or .... 
No, what we need is a proof that a number is prime when we think that it is. 


Exercise 10.8.1. Find all bases b for which 15 is a base-b Euler pseudoprime. 


Exercise 10.8.2.' We wish to show that every odd composite n is not a base-b Euler pseudoprime 
for some integer b, coprime to n. Suppose not, i.e., that n is a base-b Euler pseudoprime for every 
integer b with (b,n) = 1. 
(a) Show that n is a Carmichael number. 
(b) Show that if prime p divides n, then p— 1 cannot divide aot. 
(c) Deduce that (b/n) = (b/p) (mod p) for each prime p dividing n. 
(d) Explain why (c) cannot hold for every integer b coprime to n. 


l4We discussed the Riemann Hypothesis, and its generalizations, in sections [5.16] and [5.17] of ap- 
pendix 5D. Suffice to say that this is one of the most famous and difficult open problems of mathematics, 
so much so that the Clay Mathematics Institute has now offered one million dollars for its resolution 
(see http://www.claymath.org/millennium-problems/). 


10.8. The difficulty of finding all square roots of 1 379 


Exercise 10.8.3. Prove that F, = 22” +1 is either a prime or a base-2 strong pseudoprime. 


Exercise 10.8.4. Prove that if n is a base-2 pseudoprime, then 2” — 1 is a base-2 strong pseu- 
doprime and a base-2 Euler pseudoprime. Deduce that there are infinitely many base-2 strong 
pseudoprimes. 


Exercise 10.8.5. Pépin showed that one can test Fermat numbers F;, for primality by using 
just one strong pseudoprime test; i.e., Fim is prime if and only if 3(Fm-1)/2 = 1 (mod Fy). 
(a) Use exercise [8.5.4] to show if Fm is prime, then 3(’m—1)/2 = —1 (mod Fm). 
(b) In the other direction show that if 3(’m—1)/2 = —1 (mod Fy), then ord,(3) = 22” when- 
ever prime p|Fm. 
(c) Deduce that Fm — 1< p—1 in (b) and so Fj, is prime. 


Exercise 10.8.6.1 (a) Prove that A := (4? + 1)/5 is composite for all primes p > 3. 
b) Deduce that A is a base-2 strong pseudoprime. 


Exercise 10.8.7. How many witnesses are there mod n? Suppose that n—1 = 2*m with m 
odd and k > 1, and that n has w distinct prime factors. Let gp be the largest odd integer dividing 
(p —1,n — 1), and let 22+! be the largest power of 2 dividing gcd(p — 1: p|n). 

(a) Prove that R< k—1. 

b) Show that is 1,1,...,1 if and only if a9” = 1 (mod p*) for every prime power p®||n. 
(c) Show that there are [],),, gp such integers a (mod n). 
d) 


Show that if (10-87) is 1,1,...,1,—-1,*,...,*, with r *’s at the end, then 0 <r < R, and 
that this holds if and only if a2"9» = —1 (mod p®) for every prime power p®||n. 

) Show that there are <[],,),, 2" gp such integers a (mod n). 

(f) Show the number of strong pseudoprimes mod n is 


1 1 1 2 
R | 1 Jeeta cal | 
Ll? 9p) (1 "ow ' 92w | " 9(R-ljw ! >a) 5 
pin 


= 
oO 


(g) Prove that 2%, < pot and so deduce that the quantity in (f) is < _ and so is < +9(n) 
ifw > 3. 
(h) Show that there are < +o(n) reduced residues mod n which are not witnesses, whenever 
n > 10 with equality holding if and only if either 
e n= pq where p= 2m+1,q =4m-+1 are primes with m odd, or 
e n= pgr is a Carmichael number with p,q,7r primes each = 3 (mod 4) (e.g., 7-19-67). 


Appendix 10B. Factoring 
with squares 


An integer n is called y-smooth if all of its prime factors are < y. 


In section [10.6] we outlined the main ideas in quadratic sieve-type algorithms. 
The key question that remains is how, explicitly, to select b;,b2,... so that if a; 
is the least positive residue of b? (mod n), then there is a good chance that a; is 
y-smooth, for an appropriately chosen bound y. The idea is to then find a subset, 
call it I, of the i for which the product [],-, a: is a square, call it A?. Then, for 
B=][lj,c7bi, we have 


iel 
B? = lle = [[e: = A? (mod n), 
ie! ie! 
and we hope that either (B — A,n) or (B + A,n) is a proper divisor of n 


Here are a few methods to determine such a; so that they each have a reasonable 
chance of being y-smooth: 


Random squares 


Pick the b; at random in [1,n]. We guess that the probability that an a; is y-smooth 
is roughly the same as the probability that a random integer < n is y-smooth (this 
is unproven). 


Euler’s sum of squares method 


If we can write n = u? + dv? = r? + ds? where d,r,s,u,v > 0 with s #4 v and 
(d,n) = 1, then we have 


d(su)? = u? - ds? = (—dv”)(—r?) = d(rv)?_ (mod n), 


380 


10.9. Factoring with polynomial values 381 


so that n divides gcd(n, su—rv)-ged(n, su+ rv). This gives a factorization of n for 
if not, then n must divide either su —rv or su+rv. Now 


n? = (u? + dv”)(r? + ds”) = (ur + dsv)? + d(us — rv)? = (ur — dsv)? +d(us +rv)?. 


If, say, n divides su — rv, then n? divides n? — d(us — rv)? = (ur + dsv)? and so n 
divides ur + dsv. Dividing through by n we get a solution to 1 = a? + db? and so 
b=0asd > 0; therefore us — rv = nb = 0. Now (u,v) = (r,s) = 1 and sos =v 
which contradicts our assumption. A similar proof holds if n divides su + rv. 


The Continued fractions method 


In section of appendix 11B we will see that if p/q is a convergent to \/n, then 
|p? — nq?| < 2\/n +1. Hence above we can take b; = p; so that |a;| < 2,/n+1. We 
discussed earlier that for most n the continued fraction for \/n has period length 
about ./n, so this algorithm gives us many values of a;, in fact far more than we 
will typically need. The sizes of p; and q; grow exponentially with 7 which is not 
good for computations, but since we only need p; mod n we can work mod n when 
computing the p;; that is, we simply compute pj41 = ri41p; + pi-1 (mod n) and 
Git. = Titigi + G-1 (mod n) for each i > 1, where Vd = [ro,T1,---]. We can 
determine the r; as in section [LL.9] of appendix 11B (where the r; here are written 
as a; there), so that the numbers involved in the calculation are all < n. 


10.9. Factoring with polynomial values 
Let m = [,/n], and then let b; = m +i so that a; = (m+ i)? —nifi< 2m. Now 
2 
a; =i? +2im+ (m? — n) < 2im+ 7 < 3i/n_ provided i < 5m 


so that the a; are not much bigger than \/n. The probability that a random integer 
up to n!/2+¢ is y-smooth is significantly higher than for a random number up to n. 


An important computational issue (that we have not mentioned before) is to 
determine which of the a; are y-smooth. In the random squares method one has 
little option but to test divide to see whether each a; is y-smooth} In this “fac- 
toring with polynomial values” method we can use the formula a; = f(2), where 
f(t) =t + 2mt + (m? — n) to determine which of the a; are divisible by g, where 
q is the power of a prime < y: 


If i is the smallest integer for which q divides a;, then 0 < 7 < q—1 and q 
divides a; whenever j =i (mod q), by Corollary [2.3.1] Moreover q divides a_2m_; 
(since the sum of the two roots of f(t) is -2m), and therefore q divides a; whenever 
j = —2m—i (mod q). To find the smallest 1, > 0 for which prime p divides a;, 
we test divide ao,a,,... until we find i, (if it exists), which must be < p. Then 
to determine the smallest i for which a; divisible by p? or p?, etc., we use the 
algorithm suggested by Propositions [7.20.1] and [7.20.2] of section [7.20] in appendix 
7C. Therefore we can easily and quickly find all the prime power divisors of the a; 
that are powers of primes p < y. If their product equals a;, then a; is y-smooth. 


15 Or come up with some other method, but one always has the disadvantage that one has no prior 
knowledge of the prime factors of the a;. 


382 Appendix 10B. Factoring with squares 


This is called the quadratic sieve because it reminds one of the sieve of Eratos- 
thenes. There we found primes p by eliminating (“sieving out”) every pth value 
of the polynomial t, starting from 2p. Here we find y-smooth integers, by sieving 
through every pth value of the polynomial f(t), starting from 7,, and then from the 
least residue of —2m — i, (mod p), to determine those a, divisible by p. 


The large prime variation 


By the end of the quadratic sieve process we can write each a; = r;s; where s; is 
the y-smooth part of a; and has been completely factored, while all of the prime 
factors of r; are > y. If r; = 1, then a; is y-smooth, as desired. It has proved to be 
useful to also retain r; if it is itself a prime that is not too much larger than y: 


Exercise 10.9.1. Show that if r; =1rj, then aja; is a square times a y-smooth integer. 


In practice people also use the double large prime variation in which one also 
keeps r; if it has no more than two prime factors. 


Exercise 10.9.2. Show that if ¢, p, and q are primes > y with rj; = p, rj = pq, and rz = gq, 
then ajajax is a square times a y-smooth integer. 


The key issue is to determine how fast each of these algorithms (and their 
variants) factor a typical integer n. It is not always so easy to determine the running 
time precisely, because we may not know how to analyze how long a particular step 
in an algorithm will take (for example, in “polynomial values” we are unable to 
prove how often such numbers are y-smooth, so we make the assumption that 
they behave much like random numbers of the same size, so as to be able to do 
the analysis). Also, the best algorithms in practice tend to use some random 
choice somewhere (like “random squares” above) and so, if our luck is out, then the 
algorithm could last far longer than expected—nonetheless it suffices to determine 
an “expected running time”. All of these variations of the quadratic sieve algorithm 
give roughly the same expected running time: If n has d digits (that is, d is the 
largest integer with 107 < n), then, with probability close to 1, the quadratic sieve 
will factor n in around C'V? '°8¢ steps; the different variations give rise to different 
values of C but all with C > 1. 

None of these is the fastest factoring algorithm known. The fastest known is 
the number field sieve, which is a version of the quadratic sieve algorithm that 
works in number fields, but exploits the structure of number fields. The details 
are beyond this book. The number field sieve will factor a typical integer n with d 
digits in around Ct (log d)?/* steps, with probability close to 1. 


Appendix 10C. Identifying 
primes of a given size 


There are many situations in which one requires a prime of a certain size. For 
example the Goldbach conjecture (that every even integer > 2 is the sum of two 
primes) is an open question but has been numerically verified for all n < 4x 10!8 by 
Oliveira e Silva in 2013. In Helfgott’s proof of the ternary Goldbach conjecture, he 
began by using deep ideas to show that every odd integer n > no is the sum of three 
primes, where ng is a little smaller than 10°'. Then Platt and Helfgott showed by 
calculations that the conjecture holds for all odd n in the range 7 < n < ng. At first 
sight this might appear to “just be a calculation”, but we have little hope of being 
able to do anything like 10°! steps in an algorithm in practice, and so we need a 
clever idea. The idea is to find a sequence of primes py < pg < ++: < pe < Pri1 = N0 
with each p;,; — pj < 4 x 10'8 — 4. Given any integer n, 100 < n < ng, let j be the 
largest integer with p; < —4. Therefore 4 < n—p; <4+pj;41—pj < 4x 107, 
and so n — p; is the sum of two primes by Oliveira e Silva’s calculations. Therefore 
n is the sum of three primes. 


To make this work we only need to determine a suitable sequence of primes p;. 
The difference between consecutive p; should be around 4 x 1018, and no < 103!, 
so we only need find < 3 x 10'° primes, a quantity that is much more manageable. 
So the computational question becomes: How do we efficiently find a prime close 
to a given integer x? 


Pseudoprime tests. We expect around 1 in every logx integers around x to 
be prime (as discussed in chapter 5), so we should not have to search for long if 
we simply test x, + 1,... for primality until we find a prime. Indeed Cramér’s 
conjecture (see appendix 5C) implies that we should find a prime by the time we 
get to x + 2(logx)?. However finding a prime and proving that it is a prime are 
two different issues. As yet we have not seen a particularly efficient primality test 


383 


384 Appendix 10C. Identifying primes of a given size 


for an arbitrary given integer n, though we could verify whether 
(10.10.1) 2”-'=1 (mod n). 


If so, then n is either a prime or a base-2 pseudoprime; if not, it cannot be prime. 
Erdos proved that there are far fewer base-2 pseudoprimes than there are primes, 
so if a randomly selected integer n satisfies (10.10.1), then it is very likely to be 
prime. To be more precise, for any A > 0, there exists a constant C > 0 such that 

x 
(log x)*” 
if x is sufficiently large. Let’s take A = 10 so that a randomly chosen d digit 
integer (with d sufficiently large) is a base-2 pseudoprime with probability < 1/d?®, 
whereas we know that a randomly chosen d digit integer is prime with_probability 
around 1/d. For d = 100 the chance that n is not prime is miniscule However 
even if we find such an n we do not necessarily have an easy way to prove that it 
is prime, so we need some other tool to guarantee that an integer n that satisfies 
(10.10.1) really is prime. 

We will construct a primality test that works efficiently for integers n of a 
certain form (though not for all n). There are enough n of this form, so that we 
can expect to be able to find such an n fairly rapidly in any given, sufficiently long, 
interval. 


#{n <a: nis a base-2 pseudoprime} < 


10.10. The Proth-Pocklington-Lehmer primality test 


In section[LO.4|we saw that one can prove that n is prime if one can find an integer g 
such that g”~! = 1 (mod n) and (g—)/4—1,n) = 1 for every prime q dividing n. 
This only works, in practice, if one can factor n—1 in a reasonable time, something 
that is not guaranteed. So it is of interest to modify this method to try to be able to 
use less information to prove that n is prime. The key lemma is due to Pocklington 
(building on an idea of Proth): 


n-1 — 


Lemma 10.10.1. Suppose that a = 1 (mod n) and (a"7 —1,n) = 1 where 
n=1 (mod ¢°). If prime p divides n, then p= 1 (mod qd). 


Proof. Let m be the order of a (mod p). Now a”~! = 1 (mod p) but a(—)/4 41 
(mod p) so that m divides n—1 but not (n—1)/q; that is, gf (n—1)/m. Therefore 
if q®||n — 1, then g°||m, and m divides p — 1. The result follows. 


In 1928, Lehmer realized that one can use this to test n for primality without 
needing to fully factor n — 1, but rather factor just “half of it”. 


Theorem 10.2. Letn—1= FR where F is fully factored and F > \/n. If g?-' =1 
(mod n) and (g—-)/4 —1,n) =1 for every prime q dividing F, then n is prime. 


Proof. By Lemmma |10.10.1) we know that if p is a prime dividing n, then p= 1 
(mod q°) for each g°||F and so p = 1 (mod F). Therefore p > F > \/n, but n 
cannot have two such prime factors, so n must be prime. 


16 Assuming that d = 100 qualifies as “sufficiently large”. 


Second-order linear recurrences 385 


Exercise 10.10.1 (Proth’s Theorem). Suppose that n = k- 2™+1 where k < 2”. Show that n 


n=1 
is prime if and only if there exists an integer a for which a 2 =—1 (mod n). 


Most large primes that have been found are of this form, because this is a 
relatively easy test to implement in practice, using fast exponentiation. Helfgott and 
Platt used Proth’s Theorem in their calculations in resolving the ternary Goldbach 
problem. 


Exercise 10.10.2. Suppose that m > 1. 
(a) Show that n = 2” +1 is prime if and only if geet es i) (mod n) if and only if ae 
(mod n). 
(b) Let uo = 3 and then um4i = u?, for all n > 0. Prove that 2™ + 1 is prime if and only if 
Um—1 = —1 (mod 2” +1). (This should be easy to implement algorithmically.) 


What if one can factor a lot of n—1 but not quite half of it? In 1975, Brillhart, 
Lehmer, and Selfridge showed that one can proceed if one can factor a third of it: 


Theorem 10.3. Letn—1=FR where F is fully factored and F > (2n)‘/3. Let r 
be the least residue of R (mod F’) and s = [R/F] = (R—Yr)/F, and suppose that 
r? — 4s is not a square. If g’~! = 1 (mod n) and (g—/4 — 1,n) = 1 for every 
prime q dividing F, then n is prime. 


Proof. As in the proof of Theorem [L0.2| we know that if p is a prime dividing 
n, then p = 1 (mod F). Therefore p > n'/3, and so if n is not prime, then it 
must have exactly two prime factors; call them 1+ aF and 1+ b6F. Therefore 
R=(n-1)/F = (a+b) +abF. Now a,b < ab < n/F? < F/2, so that a+b < F. 
Moreover r = R= a+b (mod F), with r < F and sor = a+b. We also have 
s = (R—(a+b))/F = ab, and so r?—4s = (a—b)?, contradicting the hypothesis. 


We have seen that we can verify whether n is prime if n— 1 is largely factored. 
If n — 1 is difficult to largely factor, then we can proceed when n+ 1 is easy to 
mostly factor, by using second-order linear recurrence sequences. 


Second-order linear recurrences 


We can do something similar for second-order linear recurrences. Let uo = 0, 


atVA 
2 


uy = 1, and un pz = aUn41 — Un for alln > 0. Let A = a? —4, a = , and 


B= os so that aG = 1. Recall that u, = ee 


Theorem 10.4. Suppose that (n,2A) = 1 and let 6 = (4). Letn—6 = 2FR 
where F is even, fully factored and F > (./n+1)/2. If Un—s2 =9 (mod n) and 
(U(n—6)/2q,) = 1 for every prime q dividing F, then n is prime. 


Proof. Suppose that prime p divides n. Now as (p, a — 3) divides (n, a— 3) which 
divides (n, A) = 1, we deduce that u,=0 (mod p) if and only if a” = 6" (mod p). 
(These congruences are between elements of Z[VA]; the congruence classes have 
representatives a + bVA with 0 < a,b < p— 1.) As qa is a unit, then a” = fp” 
(mod p) holds if and only if a2” = a" B" =1 (mod p). 

The hypothesis, together with the first paragraph, implies that a”~° = 1 
(mod p) but a("~)/4 # 1 (mod p). Therefore, if m is the order of a (mod p), 


386 Appendix 10C. Identifying primes of a given size 


then m divides n — 6 but not (n — 6)/q; that is, gf (n — 6)/m. Hence if q°||n — 4, 
then q°||m, and so 2F divides m which divides p — (4) by Corollary 8.18.1] That 


is, p= (4) (mod 2F’) and so p > 2F —1> ./n. Therefore n cannot have two 
prime factors, so must be prime. 


There remains the question of finding such a recurrence sequence (ug) k>0 When 
n+1 is partially factored. Morrison |2] showed that one can find second-order linear 
recurrence sequences that allow one to use a modification of Theorem[I0.4]to prove 
the primality of n. (See theorem 4.2.4 in [CP05).) 


Just as when n — 1 is factorable we can modify the ideas to test whether 
Fermat numbers are prime (as in exercise [10.10.2), we can use these new ideas to 
test whether Mersenne numbers are prime. 


Corollary 10.10.1 (The Lucas-Lehmer primality test for Mersenne numbers). Let 
wo = 4 and wri = we —2 for allk > 0. Suppose that n is odd and > 3, and let 
M,, = 2” —1 be a Mersenne number. If wn-2 = 0 (mod M,,), then M,, is prime. 


It can also be shown that if M,, is prime, then w,_2 = 0 (mod M,,). 


Proof. We claim that an odd prime p divides at most one wz, for if w, = 0 
(mod p), then wz41 = 0? — 2 = —2 (mod p) and wey2 = (—2)? — 2 = 2 (mod p). 
From then on wx4; = 2? — 2 = 2 (mod p) for all j > 2, and so p{ wy for all n > k. 

Define the sequence {un}n>0 above with a = 4, so that A = 12, a=24 V3, 
and 8 = 2— V3. Write uan = UnUy so that v, =a" +B". We claim that wy = vox 
for all k > 0, since it is true for k = 0, and then wr41 = wz -2 = Us, — 2 = Vo.9k 
as vi —-2=a"4+ 2(aB)" —2+ 82” = von, for alln > 1. 

We deduce ugr = Wr 1WeR_2°+* Wo for all k > 1: It is obviously true for k = 1, 
and if it is true for k —1, then uge = voe—-1Ugr-1 = We-1° We—-2°** Wo- 

Therefore w,—1 = 0 (mod p) if and only if wa. = 0 (mod p) and (ugs-1,p) = 1: 
By the previous paragraph, the right-hand side can be rewritten as wr_, = 0 
(mod p) and (w;,p) = 1 for all j < k—1. By the first paragraph, this is the same 
as We-1 = 0 (mod p). 

We apply Theorem with n replaced by M,,. Then 6 = (#23) = (7) 
= —($) =—1 so that (2” — 1) — 6 = 2” = 2F, where F = 2”~' is fully factored. 
Therefore if w,_2 = 0 (mod M,,), then wgn-1 = 0 (mod M,,) and (ugn-2,M,) = 1 
by the previous paragraph, and so M,, is prime, by Theorem [10.4] 


References for this chapter 
[1] John Brillhart, D. H. Lehmer, and J. L. Selfridge, New primality criteria and factorizations of 
2™ +1, Math. Comp. 29 (1975), 620-647. 


[2] M. A. Morrison, A note on primality testing using Lucas sequences, Math. Comp. 29 (1975), 181— 
182. 


[3] Paulo Ribenboim, The book of prime number records, 2nd ed., Springer-Verlag, 1989. 


Appendix 10D. Carmichael 
numbers 


10.11. Constructing Carmichael numbers 


We have discussed Carmichael numbers in sections [7.6] and [10.8] of appendix 
10A. In particular in exercise [7.10.19] we came up with a systematic way of finding 
infinitely many families of Carmichael numbers with three prime factors: For given 
pairwise coprime integers a,b,c we select the residue class mo (mod abc) via the 
Chinese Remainder Theorem so that 


—;—+ (mod a), 
mo =4-Z- ¢ (mod d), 
—1— (mod c). 


If m = mo (mod abc) and am +1, bm + 1, and cm + 1 are all prime, then their 
product, N = (am +1)(bm + 1)(cm + 1), is a Carmichael number. 


The prime k-tuplets conjecture implies (stated in the “Bonus Read” after ap- 
pendix 5A) that there are infinitely many such prime triplets (creating infinite 
families of Carmichael numbers for any given pairwise coprime integers a,b,c) if 
the triplet is admissible. The triplet is admissible if for every prime p there exists 
a residue class m (mod p) with m = mo (mod abc) for which p does not divide 
N: If prime p { abc, let m = 0 (mod p) so that N = 1 (mod p). If pla, then 
bm +1 = b(—¢ — 4) +1 = —b/c (mod p), and similarly cm + 1 = —c/b (mod p) 
so that N = 1- (—b/c) - (—c/b) = 1 (mod p). We get the same congruence, by the 
analogous argument, if p|b or plc. 


Another way to construct Carmichael numbers is to begin with one, say n = 
Pip2---Pr, and then prove that there are infinitely many Carmichael numbers N = 
9192---Qk, Where the q; are each prime numbers with qg; — 1 = m(p; — 1) for 
1<j<hk, for some integer m. In exercise we showed that composite r is a 
Carmichael number if and only if \(r) divides r — 1. Therefore 


A(n) = lem[p; —1,..., pe — 1] divides n — 1. 


387 


388 Appendix 10D. Carmichael numbers 


Now if the g; are indeed all primes, then 
A(N) = lem[qi — 1, g2 —1,.--, a — 1] =m Iem[pi - 1,..., pe — 1] = mX(n). 
We select m = 1 (mod X(n)) so that (m, A(n)) = 1. Now 


N= [me ~1)+1)=1 (mod m), 


and 


N= []me; ~—1)+1)=][(@-1)+1)=n=1 (mod X(n)). 


Therefore N = 1 (mod mX(n) = A(N)). This implies that N is indeed a Carmichael 
number, by Lemma [7.6.1 


The prime k-tuplets conjecture implies that there are infinitely many such 
prime k-tuples, and hence infinitely many such Carmichael numbers, as the k-tuple 
is admissible: If prime p { \(n), let m = 0 (mod p) so that N = 1 (mod p). If 
p\A(n), then m = 1 (mod p) and so N=n=1 (mod p). 


10.12. Erdés’s construction 


Fix ¢ > 0. It is believed that there are about 2? Carmichael numbers up to x with 
three prime factors, about a? with four prime factors, etc. However we expect 
more than x'~€ Carmichael numbers up to x in total if x is sufficiently large. How 
can this be? It seems unlikely that zs +24 +25 +--+ could possibly equal x!~¢, 
so what is going on? The surprise is that the vast majority of Carmichael numbers 
have a “large” number of prime factors, in that the number of prime factors of a 
typical Carmichael number up to x goes to oo surprisingly rapidly as x grows. 


In the construction of the previous section we fixed the number of prime factors 
of the Carmichael number N (as well as the ratios q; — 1: q; — 1 of the distinct 
prime factors q;,q; of N), creating infinitely many families of Carmichael numbers, 
but atypical Carmichael numbers (as the number of prime factors is fixed). In our 
next construction the number of prime factors can vary: 

Lemma tells us that a squarefree, composite integer n is a Carmichael 
number if and only if A(m) = lem[p—1: p|n] divides n—1. Examples of Carmichael 
numbers n indicate that A(n) is typically much smaller than n, which is far from 
true for a typical integer n. Erdés reasoned if we are going to try to construct 
Carmichael numbers, then we should try to make sure that (n) is surprisingly 
small compared to n. He approached this as follows: 


e Select an integer L with lots of prime factors, for example the lcm of the 
integers < y. 

e Find a large set P = P(y) of primes p > y for which p — 1 divides L. 

e Find subsets p1,...,p, € P whose product n = p,---p, is =1 (mod L). 


Then A(n) divides L, which divides n — 1, and so n is a Carmichael number. 


10.12. Erdés’s construction 389 


For L = 120 we have 7, 11, 13,31, 41,61 € P, and we easily identify that 41041 = 
7x 11x 13 x 41 = 1 (mod 120), 172081 = 7 x 13 x 31 x 61 = 1 (mod 120), and 
852841 = 11 x 31 x 41 x 61 =1 (mod 120) and so are all Carmichael numbers. 


If we find k primes in P, then there are 2* — 1 products to test in the last 
part of the algorithm. Assuming those products are randomly distributed mod L, 
we expect to construct about 2*/LZ Carmichael numbers in this way. One problem 
though is that if L is large, then it requires a great deal of searching to find each 
such Carmichael number. Alford took a different approach. 


e Find a subset Po of P such that for every reduced residue a (mod L) there 
exists q1,---,%% € Po for which q---q, =a (mod L). 


Now if pi,...,pr is any non-trivial subset of P \ Po, then let a be the inverse of 
pi-++pr (mod L). We have qi,...,q¢% € Po for which q; --- qx =a (mod L), and so 


N=Pi-+*Pr*Q+**Gk =a '-a=1 (mod L). 


Therefore n is a Carmichael number. So if there are ¢ primes in Po, and & primes 
in P, then Alford’s idea yields 2*—* — 1 Carmichael numbers. 


Alford worked with the example L = 2°.3?-5?-7?-11 and found that there are 
155 primes p for which p — 1 divides L (that is, in P). By exhaustive calculations 
he showed that Po can be taken to be the smallest 27 primes in P, and therefore he 
deduced, on January 21, 1992, that there are at least 212° — 1 Carmichael numbers. 
This greatly increased the number known, from less than 2'* Carmichael numbers 
to many more Carmichael numbers than one could ever even hope to write down|7| 
This inspired Granville and Pomerance to work with Alford to modify his argument 
suitably so as to establish that there are infinitely many Carmichael numbers: If 
x is sufficiently large, then there are more than «2/7 Carmichael numbers up to 2, 
which was subsequently improved by Harman to x!/°. The ideas in the proof can 
be modified to show that for any integer B, there are infinitely many Carmichael 
numbers for which the least witness is > B 


Carmichael numbers remain scarce all the way up to 10!°, which is surprising 
if Erdés’s conjecture that there are more than x'~* Carmichael numbers up to x 
once x is sufficiently large is to be believed. Indeed Shanks challenged 
those who believe Erdés’s conjecture to produce a value of x for which there are 
more than «!/? Carmichael numbers up to a[/] 


It is believed that there are about 7(x)/d(m) primes = 1 (mod m) up to x once 
x > m'*© (with m sufficiently large) and therefore the number should definitely 
be > n(a)/2m. In it was shown that if this is true, then it implies Erdés’s 
conjecture that there are more than x'~€ Carmichael numbers up to x once = is 
sufficiently large. This shows that Erdés was almost certainly correct and Shanks 
wrong, but it still flies in the face of the computational evidence. In |2] the authors 
try to reconcile our theoretical understanding with the data. 


'7What one might call “infinity in practice”. 

'8Certain well-known software packages did, at that time, assert that a given integer is prime 
if it is a strong pseudoprime for some given finite set of bases. This result shows that that software 
misidentified composites as primes. 


19Up to « = 101, there are only a few more than al 


Carmichael numbers. 


390 Appendix 10D. Carmichael numbers 


The computational evidence 


Richard Pinch has made extensive calculations of Carmichael numbers. For example 
the number of Carmichael numbers C(x) up to x for x = 10/ for 3 < j < 21 is 
given in the following table: 


n || 10% | 104 | 105 | 10° | 107 | 108 | 10° | 101° 
C(n) 1 7 | 16] 43 | 105 | 255 | 646 | 1547 


n || 1077 | 102? 1018 1014 101% 1016 
C(n) |} 3605 | 8241 | 19279 | 44706 | 105212 | 246683 


n 1017 1018 10/9 107° 107+ 
C(n) |} 585355 | 1401644 | 3381806 | 8220777 | 20138200 


We expect C(x) > x1~*, so it is troubling that C(10?1) is only a little bigger 
than 2271/3. We do not know how large a needs to be to have C(a) > \/z. 

Another important issue is to understand how the number of prime factors of 
typical Carmichael numbers grow with «. It is believed that C;(z) is roughly «!/*, 
for each fixed k once «x is sufficiently large. 


Carmichael numbers up to 107! have at most 12 prime factors. In the next 
table we give (for typographical reasons) the values of C;,(10’) with k < 9: 


x [| Cs(x) | Cale) | Cs(2) Co(x) C7(z) Ca(x) Co(x) 
10° 1 0 0 0 0 0 0 
104 7 0 0) 0 0) 0 0 
10° 12 4 0 0 0 0 0 
10° 23 19 1 0) 0 0 0 
10” AT 55 3 0 0 0) 0 
108 84 144 27 0) 0) 0 0 
10° 172 314 146 14 0 ) 0) 
02° 335 619 492 99 2 0 0 
ott 590 1179 1336 459 41 0 0 
0}? 1000 2102 3156 1714 262 7 0 
(a 1858 3639 7082 5270 1340 89 1 
014 3284 6042 14938 14401 5359 655 27 
ot® 6083 9938 | 29282 36907 19210 3622 170 
o'® | 10816 | 16202 | 55012 86696 60150 16348 1436 
o'” | 19539 | 25758 | 100707 | 194306 | 172234 63635 8835 
o'8 | 35586 | 40685 | 178063 | 414660 | 460553 | 223997 44993 
o'® | 65309 | 63343 | 306310 | 849564 | 1159167 | 720406 | 196391 
07° | 120625 | 98253 | 514381 | 1681744 | 2774702 | 2148017 | 762963 
07+ | 224763 | 151566 | 846627 | 3230120 | 6363475 | 6015901 | 2714473 


The median size of k, the number of prime factors, seems to grow quite rapidly. 
If each Cy (a) is roughly «!/*, we should have C3(a) > C4(a) > --- > Cy(a) once x 
is sufficiently large, but we do not even have O3(a) > C4(a) until x > 101%. 


References for this chapter 
[1] W. R. Alford, Andrew Granville, and Carl Pomerance, There are infinitely many Carmichael num- 
bers, Ann. of Math. 139 (1994), 703-722. 


{2] Andrew Granville and Carl Pomerance, Two contradictory conjectures concerning Carmichael 
numbers, Math. Comp. 71 (2002), 883-908. 


Appendix 10E. Cryptosystems 
based on discrete logarithms 


Given a primitive root g mod p and a reduced residue a (mod p), we know that 
there exists an integer k for which a = g* (mod p). We write k := ind,(a), the 
discrete log of a mod p in base g, which is well-defined mod p — 1 (see section [7.16] 
of appendix 7B). The problem of efficiently determining k (given a, g, and p) is 
called the discrete log problem and seems to be difficult. It has therefore been used 
as the basis for various cryptographic protocols. 


10.13. The Diffie-Hellman key exchange 


Alice and Bob wish to create a secret number that they both know, without meeting. 
To do so, they must share information across on open channel with Oscar listening: 


e They agree upon a large prime p and primitive root g. 
e Alice picks her secret exponent a, and Bob picks his secret exponent b. 


e Alice transmits the least positive residue of g* (mod p) to Bob, and Bob 
transmits the least positive residue of g’ (mod p) to Alice. 
e The secret key is the least residue of g* (mod p). Alice computes it as g? = 
(g’)* (mod p), and Bob computes it as g*” = (g*)® (mod p). 
Oscar has access to p as well as to g, g*, and g? (mod p), and he wishes to determine 
g* (mod p). The only obvious way to proceed, with the information that he has, 
is to compute the discrete logarithms of g® or g? in base g, to recover a or b or both 
and hence to determine (g’)® or (g)° or g%° (mod p), respectively. Notice that, in 
this exchange, Bob never knows Alice’s secret exponent a, and Alice never knows 
Bob’s secret exponent b. 
There is a lot more that can be said, for example how to stop a “man-in- 
the-middle” attack. That is, Oscar can get in between Alice and Bob on their 
communication channel and hence pretend to be Bob when dealing with Alice and 


391 


392 Appendix 10E. Cryptosystems based on discrete logarithms 


send her his own g? and similarly pretend to be Alice when dealing with Bob and 
send him his own g®. It is difficult to stop, or even to recognize, such fraudulent 
behavior, but there are well-established protocols for dealing with this, and other, 
difficult situations. 


10.14. The El Gamal cryptosystem 


This public key cryptosystem uses the secret key, g*? (mod p), from above: 


e Alice wishes to transmit a message M to Bob. She creates the ciphertext 
x = M/g* (mod p) and transmits that. 


e Bob determines the original message M by computing xg*’ (mod p). 


Oscar has access to p as well as g, g*, and g? (mod p), and the ciphertext 2. 
Determining M = xg* (mod p) is therefore equivalent to determining g*? (mod p). 
This is the same mathematical problem as in the Diffie-Hellman key exchange. 


Why choose one cryptosystem over another? This is an important practical 
question, especially as we are unable to prove that any particular cryptosystem 
is truly secure (since, for all we know, there may be polynomial time algorithms 
for factoring or for solving the discrete log problem). Most people who are not 
directly involved in selling a particular product would guess that RSA is the safest, 
since factoring is a much better explored problem than discrete logs. However 
the El Gamal system has a distinct advantage over RSA, which is the quantity 
and difficulty of the calculations involved in implementing the algorithms: Let us 
compare the cryptosystems if Alice is regularly communicating with Bob. In RSA 
she must raise M to the power e each time she transmits a message, which requires 
around loge multiplications mod p. It would probably be best to choose e to 
be large, that is, of length comparable to the length of p, to ensure that RSA is 
most likely to be secure. On the other hand, in the El Gamal cryptosystem, Alice 
can compute g* and g~® mod p, once and for all, so that when she transmits 
her ciphertext she simply multiplies M by g~®? (mod p), one multiplication. This 
difference may not be so important if Alice works with a large computer, but many 
applications today use handheld devices, like a cellphone or a smartcard, which 
have limited computing capacity, so this time difference can be very significant. 


To get the best of both worlds, one might choose to use a possibly not so 
secure but fast cryptosystem to exchange most messages, with regular changes of 
key, exchanged in a highly secure manner (because, traditionally, cryptographers 
have used similarities between many different ciphertexts created with the same 
key to expose flaws in cryptographic protocols). Therefore one might use the El 
Gamal cryptosystem on a day-to-day basis, while changing keys regularly using the 
Diffie-Hellman key exchange protocol. 

Because of its high computational costs RSA is usually used only for highly 
secure messages, such as key exchanges. In practice, users take e = 21° + 1, not too 
small but certainly not large, to simplify calculations (so we can efficiently compute 
the power M® (mod n) as described in section [7.13] of appendix 7A). 


For more about number theory algorithms, see the masterful book |CP05). 


Appendix 10F. Running times 
of algorithms 


10.15. P and NP 


One should distinguish between a mathematical problem and the possible algo- 
rithms for resolving that problem. There may be many choices of algorithm and 
one wishes, of course, to find a fast one. We denote by P the class of problems that 
can be resolved by an algorithm that runs in polynomial time. There are very few 
mathematical problems which belong to P. 


In section [10.4] we discussed problems that have been resolved, for which the 
answer can be quickly checked. For example one can exhibit factors of a given 
integer n to give a short proof that n is composite. We also saw Lucas’s short 
proof that a number is prime based on the fact that only prime numbers n have 
primitive roots generating n — 1 elements. By “short” we mean that the proof can 
be verified in polynomial time, and we say that such problems are in the class NP 
(“non-deterministic polynomial time’29). We are not suggesting that the proof can 
be found in polynomial time, only that the proof can be checked in polynomial 
time; indeed we have no idea whether it is possible to factor numbers in polynomial 
time, and this is now the outstanding number theory problem of this area. 


By definition PCNP; and of course we believe that there are problems, for ex- 
ample the factoring problem, which are in NP, but not in P; however this has not 
been proved, and it is now perhaps the outstanding unresolved question of theoret- 
ical computer science. This is another of the Clay Mathematics Institute’s million 
dollar problems, and perhaps the most likely to be resolved by someone with less 
formal training, since the experts seem to have few plausible ideas for attacking 
this question. 


2°Note that NP is not “non-polynomial time”, a common source of confusion. In fact it is 
“non-deterministic polynomial time” because the method for discovering the proof is not necessarily 
determined. 


393 


394 Appendix 10F. Running times of algorithms 


It had better be the case that PANP, or else there is little chance that one can 
have safe public key cryptography (see, e.g., section [10.3) or that one could build a 
highly unpredictable pseudorandom number generato12!| or that we could have any 
one of several other necessary software tools for computers. 


To have a good cryptosystem we want Bob to be able to decipher Alice’s ci- 
phertext quickly, and so the cryptosystem should be based on a number theory 
problem in the complexity class NP. However to stop Oscar being able to crack the 
cryptosystem, the number theory problem had better be difficult to solve, so should 
not be in the complexity class P. Therefore, to have safe cryptography, it appears 
that we need to have PANP. This question remains unresolved, and so no fast public 
key cryptographic protocol is, as yet, provably safe! 


10.16. Difficult problems 


There are only a finite number of possible commands for each line of a computer 
program, which therefore induce a finite number of possible states for the number 
and values of the variables [29 It is therefore easy to show that most problems need 
exponential length programs to be solved: 


We consider the set of problems where we input N bits and output one bit, 
that is, functions 
f+ {0,1})% > {0,1}. 
Since there are 2% possible inputs, and for each the function can have two possible 
outputs, hence the number of such functions is 2" 


If a computer language allows M different possible statements on each line, 
then the number of programs containing k lines is M*, and this is therefore a 
bound on the number of functions that can be calculated by a computer program 


that is k lines long. Therefore if k < cy, - 2%, where cjg := Slee ae then there are 


< V228 programs that are & lines long, and so we can compute no more than V22% 
functions with a k-line program in this language. Hence the vast majority of such 
problems require a program of length at least cj - 2%. (Notice here that M, and 
thus cys, is fixed by the computer language, and N is varying.) 


Since almost all problems require such long programs, exponential in the length 
of the input, one would think that it would be easy to specify problems that need 
longish programs. However this is a wide open problem. Indeed even finding specific 
problems that cannot be resolved in polynomial time is open, or even problems that 
really require more than linear time! This is the pathetic state of our knowledge 
on lower bounds for running times, in practice. So if you ever hear claims that 
some secret code is provably difficult to break, that your secrets are perfectly safe, 
then either there has been a major scientific breakthrough or you are hearing a 
salesman’s jibber-jabber, not mathematical proof. 


21S0-called “random number generators” cannot be random as they run on a computer where every- 
thing is designed to be determined! In reality they create a sequence of integers, determined in a totally 
predictable manner, but which appear to be random when subjected to “randomness tests” in which 
the tester does not know how the sequence was generated. See Oded Goldreich’s, Pseudorandomness, 
Notices Amer. Math. Soc. 46 (1999), 1209-1216. 

22 Here we are talking about a classical computer. As yet impractical quantum computers face less 
restrictions and thus, perhaps, will allow more things to be computed rapidly. 


Appendix 10G. The AKS test 


Agrawal, Kayal, and Saxena’s work starts with the following characterization of 
prime numbers. 


Theorem 10.5. n is prime if and only if (ac +1)” =a” +1 (mod n) in Z[z]. 


Proof. Since (x+1)"—(#" +1) = Vicjen-1 (G)a, we have that 2" +1 = (4+1)" 
(mod n) if and only if n divides (") for all j in the range 1 <j <n-1. 

If n = pis prime and 1 < 7 < p—1, then p appears in the numerator of @) but 
is larger than, and so does not divide, any term in the denominator, and therefore 
p divides (@): 


If n is composite, let p be a prime dividing n. In the expansion 


(") _ n(n—1)(n—2)--(n—(p-1)) 


p p! 


the only terms divisible by p are the n in the numerator and the p in the denom- 
inator, and so if p* is the largest power of p dividing n, then p*—! is the largest 
power of p dividing Ch Therefore n does not divide (os and so we deduce that 
(a +1)" #4a"+41 (mod n). 


This simple theorem is the basis of the AKS primality test. However we can’t 
quickly calculate (« + 1)" — (x” + 1) (mod n) and determine whether or not n 
divides each coefficient, since computing (v+1)” (mod n) is obviously slow since it 
will involve n coefficients. To reduce the number of coefficients involved, we might 
compute modulo some small degree polynomial as well as mod n, so that neither the 
coefficients nor the degree in the calculation gets large. The simplest polynomial 
of degree r is perhaps 2” — 1. So why not verify whether 


(e+1)" =a2"+1 = mod (n,2" —-1)? 


395 


396 Appendix 10G. The AKS test 


Here f(x) = g(x) mod (n,h(x)) for some f(x), g(x), h(a) € Z[x] means that there 
exist polynomials u(x), v(a) € Z{a] for which f(a) — g(x) = nu(x) + h(x)v(a2). In 
other words, (n, h(x)) is the ideal of Z[x] generated by n and h(x). 

This new congruence can be computed rapidly and it is true for any prime n 
(as a consequence of the theorem above), but it is unclear whether this fails to 
hold for all composite n and thus gives a true primality test. The main theorem of 
Agrawal, Kayal and Saxena provides a modification of this congruence, which can 
be shown to succeed for primes and fail for composites, thus providing a polynomial 
time primality test. 


Exercise 10.17.1. Suppose that (a,n) = 1. Prove that n is prime if and only if (z+a)" = 2"+a 
(mod n) in Z[z]. 


10.17. A computationally quicker characterization of the primes 


Lemma 10.17.1. For any given integer n > 1 there exists an integer r in the range 
2<r<R such that n has order > \/R/logn modulo r. 


Proof. If n has order m mod r, then r divides n™ — 1. For each prime p let p° 
be the largest power of p that is < R and suppose that n has order < M mod p*. 
Therefore p® divides [],,<,,(n™ — 1), and therefore, the product of the p* also 
divides it. Therefore, by (6.5.6) we have 


ZF <lem[m :m<Rl = [= II n™ < niM?+M)/2, 
m<M 


The yields a contradiction if R > M? log n (as log2 > $). 


Given an integer n > 2, let r be a positive integer for which n has order d 
modulo r, where d > (3logn)?. Lemma implies that there is such an 
r < 81(logn)°. Since we wish to test whether n is prime we will assume that 

e nis not a perfect power, 

e n does not have any prime factor < r, 

e (x +a)" =2"+a mod (n,2" — 1) for each integer a with 1 < a < A where 
= [3/7 logn]. 


All of these criteria hold for prime n (by exercise [10.17.1), and we wish to show 
that they do not hold for composite n, and therefore prove Theorem [10.1 


10.18. A set of extraordinary congruences 

Let n be a composite integer as above, and let p be a prime dividing n so that n is 
not a power of p. Moreover 

(10.18.1) (cx+a)"=2x"+a_ mod (p, x2" — 1) 


for each integer a, 1 < a < A. We can factor x” — 1 into irreducibles in Z[z], 
as [| dlr ®4(x), where ®g(a) is the dth cyclotomic polynomial (as in appendix 4), 
whose roots are the primitive dth roots of unity. Each ©,(2) is irreducible in Z[:] 


Upper bounds on |G| 397 


but may not be irreducible in (Z/pZ)[x]; so let h(x) be an irreducible factor of 
®,(x) (mod p). Then implies that 

(10.18.2) (c +a)" =a" +a _ mod (p,h(z)) 

for each integer a,1 <a < A, since the ideal (p, h(a)) divides the ideal (p, a” — 1). 


The congruence classes mod (p, h(x)) can be viewed as the elements of a field 
F so that the congruences (10.18-2) are much easier to work with than (10.18-7), 
where the congruences do not correspond to a field. Note that x has order r in F, 
as h(x) divides x” — 1 but not x4 — 1 for any d <r. 


Let H be the elements mod (p,x" — 1) generated multiplicatively by x, « +1, 
u+2,...,x+A. Let G be the (cyclic) subgroup of F generated multiplicatively by 
v,x+1,0+2,...,2+ A; in other words G is the reduction of H mod (p,h(x)). All 
of the elements of G are non-zero for if 7+a = 0 in F, then 2” +a = (x+a)” =O0in 
F by (£0.18-2), so that 2” = —a = x in F, which would imply that n = 1 (mod r). 
However this would mean that n has order 1 mod r, contradicting our assumptions. 


Our plan is to give upper and lower bounds on the size of G to establish a 
contradiction. 


Upper bounds on |G| 


In this subsection we will establish that 
(10.18.3) ieienrv = 1, 


where FR is the subgroup of the multiplicative group of reduced residues mod r, 
generated by n and p. 


Define S to be the set of positive integers k for which 
g(a*) = g(x)* mod (p, 2" — 1) for all g € H. 
We can deduce that g(x") = g(x)* in F for each k € S. 
Lemma 10.18.1. Jfa,b€ S, thenabe S. 
Proof. If g(x) € H, then g(x’) = g(x)’ mod (p,x" — 1) since b € S; and so, 


replacing x by x%, we get g((x%)’) = g(x*)’ mod (p,(x*)" — 1), and therefore 
mod (p,2” — 1) since «” — 1 divides 7°" — 1. Therefore, as a € S we have 


g(a)” = (g(x)*)” = g(2*)’ = g((a*)’) = g(a”) += mod (p, 2" — 1) 
and so ab € S as desired. 


Corollary 10.18.1. RCS. 


Proof. Evidently p € S. Moreover n € S since if g(x) = [[pcge4(e +a) € A, 
then a 


g(x)” = [][(@ + a)”)* = [](@" +.a)% = g(2") mod (p,2” — 1) 


by (10.18.71). 
But then, RC S by Lemma[i0.18-1] 


Lemma 10.18.2. Ifa,b€ S anda=b modr, thena=b mod |G|. 


398 Appendix 10G. The AKS test 


Proof. For any g(x) € Z[z] we have that u — v divides g(u) — g(v). Therefore 
ax” — 1 divides 2*~° — 1, which divides x* — x’, which divides g(x*) — g(x”); and so 
we deduce that if g(x) € H, then g(x)* = g(x*) = g(x”) = g(x)’ mod (p,x" — 1). 
Thus if g(x) € G, then g(x)*~° =1 in F. 

As G is a cyclic group we can take g to be a generator of G. Then g has order 
|G|, and so |G| divides a — b. 


Proof of (10.18.3). Since n is not a power of p, the integers n’p’ with i,j > 0 are 
distinct integers. There are > |R| such integers with 0 < i,j < \/|R|, so two must 
be congruent (mod r), say 

n'p) =n'p! (mod r). 
By Corollary|10.18.1}these integers are both in S. By Lemmal10.18.2}their difference 
is divisible by |G], and therefore 


Ie) < we —n'p"|< Gv =1< nV a1, 


as p <n, since p divides n. 
Lower bounds on |G| 


Lemma 10.18.3. Suppose that f(x), g(x) € Zlx] both have degree < |R|, their 
reductions in F both belong to G, and f(x) = g(x) mod (p,h(x)). Then f(x) = g(x) 
(mod p). 


Proof. Consider the polynomial A(y) := f(y) — g(y) reduced in F. If k € S, then 


A(x") = f(x") — g(a") = f(x)’ — g(x) =0 mod (p, h(z)). 
Therefore {x* : k € R} are all distinct roots of the polynomial A(y) in F, as x has 
order r in F. Therefore A(y) has degree < |R|, with > |R| distinct roots in F, and 
so A(y) = 0 in F by Lagrange’s Theorem (Proposition [7.4.1). This implies that 
A(y) = 0 (mod p) as its coefficients are independent of x. 


Now 1,n,n?,...,n¢-! € R, and so |R| > d > (3logn)?. Therefore A > B and 
|R| > B, where B := [3\/|R| logn]. The products [[,-7(x + a) lie in G for every 
subset T of {1,2,...,B} and are distinct by Lemma[I0.18.3] Therefore 


IG) So? Saev ls, 
which contradicts (10.18.3). This completes the proof of Theorem [10.1 


References for this chapter 


[1] Andrew Granville, It is easy to determine whether a given integer is prime, Bulletin of the American 
Mathematical Society 42 (2005) 3-38. 


Appendix 10H. Factoring 
algorithms for polynomials 


10.19. Testing polynomials for irreducibility 


There are few tools to prove that a polynomial f(x) € Z[s] is irreducible (though 
we saw a particularly elegant criterion in Theorem[5.5). The most effective tools are 
based on the fact that if f is irreducible modulo some prime p, then f is irreducible 
(since if f is reducible, say f = gh, then f = gh (mod p) for every prime p). For 
example x? + 1 is irreducible mod 3, so it is irreducible in Z[z], as well as in Q[z] 
(by Lemma 3.22.2). 

One can develop this idea further: f is irreducible if its possible factorizations 
into irreducibles, modulo two different primes, cannot possibly correspond to one 
another. For example x* + x? + 2a — 1 factors into irreducibles as (2? + 2 + 1)? 
(mod 2) and as (x—1)(x? +”? —2x+1) (mod 38). So if xt +274 22-1 = g(x)h(z), 
then deg g = degh = 2 by the reduction mod 2, but this does not correspond to 
the factorization mod 3, and so x+ + x? + 2” — 1 is irreducible. 


These techniques are not guaranteed to always work; for example, one can factor 
the irreducible polynomial 2* + 1 into the product of two degree-two polynomials 
modulo any prime p: 


Exercise 10.19.1. (a) Factor 2+ +1 (mod 2). 
(b) If prime p = 1 (mod 4), show that we can factor «+ + 1 as (a? + b)(a? — b) (mod p) for 
some value of b (mod p). 
(c) If prime p = 3 (mod 4), show that we can factor z4+1 as (x?+ba+a)(x?—bx+a) (mod p), 
for some values of a and b (mod p). 


One of the more surprising irreducibility criteria was given by Eisenstein in the 
1840s, which corresponds to reducing f(x) (mod p?) for an appropriate prime p. 


Theorem 10.6 (Eisenstein’s Irreducibility Criterion). Suppose f(x) is a polyno- 


mial with integer coefficients of degree d > 1, say f(a) = ae, ajz) € Zz]. If p 


399 


400 Appendix 10H. Factoring algorithms for polynomials 


is a prime for which f(x) = agx¢ # 0 (mod p), that is, ag = ay = ++ = aq_1 = 0 
(mod p), whereas p* { f(0) = ao, then f(x) is irreducible in Z[z]. 


Proof. If f = gh with g,h € Z[z], then p divides g(0)h(0) = f(0), but not p?, and 
so p divides only one of g(0) and A(0), say g(0). We write g(x) = 0%") ba? and 
h(a) = a, cjx), where m+n =d with m,n > 1 and p}{ co = h(0). Select J to 
be the smallest integer for which p does not divide by. Now J exists as g(x) # 0 
(mod p) (or else f(x) = g(x)h(x) = 0 (mod p) which is false by hypothesis), and 
J > 1as bo = g(0) = 0 (mod p). Therefore 1 < J < m < d—1 and we have 
g(x) = Oj", bx) (mod p) with b; #0 (mod p). This means that 


F(z) = g(x)h(x) = (bmx tote bya?) (enx” shies co) 


= age? +--+ cpbsa7 (mod p), 


and so comparing the coefficients of x7 on either side yields that ay = coby # 0 
(mod p), contradicting the hypothesis. Therefore f is irreducible. 


Examples. Let f(x) = x° — 6x +3. Then 3 divides all the coefficients of f(x) 
except the first one, and 9 does not divide the coefficient of 7°. Hence x° — 62 + 3 
is irreducible. 


The pth cyclotomic polynomial (see appendix 4E) is given by 


aP —1 


p(x) = aPh + aPP +...+1=——, 
x—1 


for p prime. The coefficients are all 1’s so it does not seem that Eisenstein’s 
Irreducibility Criterion is applicable. However a change of variable yields that 
dp(a +1) = (eruPrd = ei Qa by the binomial theorem. Now p|(®) but 
pt (7) for 1 < j < p—1 by exercise 2.5.9{b), and so the conditions of Eisenstein’s 
Irreducibility Criterion are satisfied. Therefore ¢,(#+1) is irreducible, and so ¢,(z) 
is irreducible. 


10.20. Testing whether a polynomial is squarefree 


This is easy! In section 2.10] of appendix 2B we observed that f(x) € Z[z] has 
repeated roots if and only if f(x) and f’(2) have a common factor, and in section 
[2.11] of appendix 2B that this can easily be determined by applying the Euclidean 
algorithm in Z[z] to f(x) and f’(x). The algorithm either terminates in an integer, 
the discriminant of f, or a polynomial which is the largest common polynomial 
factor of f and f’. 


The analogous question for integers, to determine whether a given integer is 
squarefree, seems to be much more challenging. Indeed there is no known algorithm 
to test this that is any faster than fully factoring the integer and then seeing if any 
of the prime factors appear in the factorization to a power greater than 1. 


10.21. Factoring a squarefree polynomial modulo p 401 


10.21. Factoring a squarefree polynomial modulo p 


We wish to factor a given polynomial f(x) (mod p) of degree d, which we know 
has no repeated factors (as can be tested by determining the gcd of f(x) and f’(z) 
mod p). We now discuss Berlekamp’s 1967 algorithm. Let 

S(f) = {g(a) (mod p): deg(g) <d—1 and g(x)? = g(x) (mod (p, f(x)))}- 
We have m € S(f) for each constant m (mod p). 
Lemma 10.21.1. Suppose that f(x) = P,---P, (mod p) where the P;(x) are dis- 


tinct and irreducible polynomials mod p. There is a bijection @: S(f) + (Z/pZ)" 
given by o(g) = (m1,..., Nr) where g(x) =n; (mod (p, P;(x))) forl <j <r. 
Proof. If g € S(f), then 

Pi: ++ P, = f(a) divides g(x)? — g(x) = g(x)(g(a) -1)---(g(a)-(p—1))_ (mod p) 
and so each P;(a) divides some g(x) — nj; mod p. 

In the other direction, the congruences g(x) = n,; (mod (p, P;)) for each j are 
satisfied by a unique congruence class modulo (p, P, --: P,) = (p, f \, by the Chinese 
Remainder Theorem. Let g(x) be the unique element of that congruence class of 
degree < d. For each j, we have 

g(a)? = nj = nj = g(x) (mod (p, P;)), 


which implies that each P; divides g?—g (mod p), and so f divides g?—g (mod p); 
that is, g(x) € S(f). 


Hence, S(f) is the set of constant functions mod p if and only if r = 1. 
Lemma 10.21.2. Ifg € S(f) but is not a constant, then there exists an integer n 
(mod p) such that 

ged( f(x), g(a) —m) (mod p) 
is a proper factor of f(a) (mod p). 
Proof. In Lemma we saw that each P;(x) divides g(x) — n for some n 


(mod p). If so, then ged( f(x), g(a) — n) is a proper factor of f(x) (mod p) or else 
f(a) divides g(a) —n #0 (mod p) which is impossible as deg(f) < deg(g —1n). 


We need to determine the elements of S(f): For each k, 0 < k < p—1, there 
exist cx,; (mod p) for which 


d-1 
2 = Seng! (mod (p, f(¢))) 
j=0 
Therefore if g(a) = ey g;x), then g(x) € S(f) if and only if 
= d-1 
5 9527 = g(x) = g(x)? = g(x”) = >" ona?* (mod (p, f(x))) 
= k=0 


402 Appendix 10H. Factoring algorithms for polynomials 


Comparing the coefficients of each side we deduce that 
g € S(f) if and only if g(C —I)=0 (mod p), 


where g = (go,---,;ga—-1) € (Z/pZ)4 and C = C(f) is the d-by-d matrix with i, jth 
entry cj,;. Therefore V(f) = {g: g(x) € S(f)} is the (right)-null space of C(f) —I 
(mod p), something that is easy to calculate in practice using the tools of linear 
algebra. We deduce that V(f) is a vector space of dimension r, and any basis which 
includes the vector (1,0,...,0) gives rise to a basis 1, g2,..., 9, for S(f). In other 
words 


S(f) = {a1 + @2go(x) +--+ +4,g,(x) (mod p):a1,...,a, (mod p)}. 


(It is easy to show that S(f) can be so generated for if v,w € S(f) and a,b are 
residues mod p, then 


(av(a) + bw(ax))? = aPu(x)? + bP w(x)? = av(x) + bw(x) (mod p) 
so that av + bw € S(f).) 


We claim that for any 1 < i < 7 < r there exists an integer k, 1 < k <r, 
such that g(a”) = n; (mod (P;(x),p)) and g,(x) = n; (mod (P;(x),p)) for some 
residues nj; # n; (mod p). If not, then for each k there exists n;, (mod p) such 
that g,(2) = nx (mod (P;(x)P;(x),p)). Any g € S(f) is a linear combination 
of g1,;---,9r (mod p), and so there exists n(g) (mod p) such that g(x) = n(g) 
(mod (P;(x)P;(x),p)). But this contradicts the fact that @ is a bijection, which we 
proved in Lemma[i0.21.1] 

We have therefore proved that there exists an integer k for which P;(a) divides 
gced( f(x), gx(x)—ni) (mod p) and P;(x) divides ged( f(x), gx(x)—n,;) (mod p) with 
n; #n,; (mod p). Therefore P;(x) and P;(x) appear in different gcds. So if we 
calculate the 


gcd( f(x), 94(a) —n) (mod p) for2<k<randalln (mod p), 
and then the gcds of all these factors, we will obtain all of the P; by exercise[10.21-1] 


Exercise 10.21.1. (a) Suppose that S1,...,Sm C {1,...,r} with the property that for any 
ij there exists k such that i € S, but 7 ¢ Sz. Prove that for each h, 1 <h <r, there is 
a subset J, C {1,...,m} for which ,¢7, Sk = {h}. 

(b) Let P,...,P, be irreducible polynomials mod p. Suppose we are given a collection of 
polynomials h1(x),...,m(a) (mod p) which are each products of some subset of the P;(x), 
with the property that for any 1 ¢ j there exists k such that P; divides hy, but not P;. Show 
that if we take all the possible gcds of the hz, we will obtain each of the P;. 


We have established a technique to factor any given f(x) (mod p) into irre- 
ducibles. In section|16.4| we will use this to provide an efficient algorithm to factor 
polynomials in Z[2]. 


References for this chapter 


[1] Donald E. Knuth, The art of computer programming, 2nd ed., Addison-Wesley, 2 (1981), section 
4.6. 


{2] Susan Landau, Factoring polynomials quickly, Notices Amer. Math. Soc. 34 (1987), 3-8. 


a 
Chapter 11 


Rational approximations 
to real numbers 


How well can we approximate a real number by rational numbers? Obviously we 
can approximate a by 3,3.1,3.14, etc., but there are even better approximations 
like 3, 2, 383, 395... (see section [9] of appendix 11B for details). Are these the 
“best” approximations? And how do we measure how good an approximation is? 
We study these questions in detail in this chapter. 

To start with we could ask how well we could approximate a rational number 
a = p/q with (p,q) = 1 and q > 1, by other, unequal, rational numbers. For any 
rational m/n with n > 1, which is 4 p/q, the difference is 


Peal, & 


(11.0.1) 
qn qn 


Pp m 
qd nm 


since |pn — gm| is a non-negative integer that cannot be 0 as p/q 4 m/n, and so 
must be > 1. We have therefore shown that the difference between rational a and 
an approximation m/n is at least some constant (in this case 1/q) times 1/n. We 
will see in the next section that one obtains much better approximations when a is 
real and irrational. 


11.1. The pigeonhole principle 


If real irrational a is very close to m/n, then na must be close to m, so we are 
interested in how close the integer multiples of a given real number a can be to an 
integer. Dirichlet noted that one can get a surprisingly good answer to this question 
using the pigeonhole principle. 


403 


404 11. Rational approximations to real numbers 


Theorem 11.1 (Dirichlet’s Theorem). Suppose that a is a given real number. 
For every integer N > 1 there exists a positive integer n < N such that 


|lna—m| < — 
for some integer m. In other words, 
| m 1 


a-—| < —. 
n nN 


Proof. The N+1 numbers {0-a}, {1-a}, {2-a},..., {N-a} (where {t} denotes 
the fractional part of t) all lie in the interval [0,1). The intervals 


8). [Bodo B20 


partition [0, ye and so each of our N + 1 numbers lies in exactly one of the N 
intervals. Therefore some interval must contain at least two of our numbers by the 
pigeonhole principle, say {ia} and {ja} with 0 <i <j < N, so that |{ia}—{ja}| < 
x: Therefore, ifn = j — i, then 1 <n < N, and if m := [ja] — [ia] € Z, then 


na—m = (ja— ia) — ([ja] — [ta]) = {ja} — fia}, 


and the first result follows by taking absolute values. The second result follows by 
dividing through by n. 


Exercise 11.1.1. Prove that for any irrational real number a there are arbitrarily small real 
numbers of the form a+ ba with a,b € Z. 


Corollary 11.1.1. If a is a real irrational number, then there are infinitely many 
pairs m,n of coprime integers for which 

m 1 

jo- 7] < =. 


n n? 


For large n this is a far better approximation of a than one can obtain for 


rational numbers, as we saw in (11.0.1). 


Proof. Suppose that we are given a finite list, (m,;,n;), 1 <j < k, of solutions 
to this inequality. Since this is a finite list there is some solution with |nj;a — m,| 
minimal, and |nj;a — m,| must be > 0 as a is irrational. Therefore we can let N 
be the smallest integer > 1/ mini<;<,{|nj;a — m,|}. By Dirichlet’s Theorem there 
exists n < N such that 


Now 
1 ; 
|na—m| < NW < |nja—m,| for all J, 
and so (n,m) is another solution to the inequality, not included in the list. This 


implies that any finite list of solutions can be extended, and so there are infinitely 
many solutions. 


1That is, each point of [0,1) lies in exactly one of these intervals, and the union of these intervals 
exactly equals [0, 1). 


11.1. The pigeonhole principle 405 


Dirichlet’s Theorem is a very useful result as we will now exhibit by reproving 
two big results from earlier in the book: 


Another proof of Corollary 3.5.2] [If (a,m) = 1, then a has an inverse mod 
m.] Take m > 2. Let a= * and N = m —1 in Dirichlet’s Theorem so that there 
exist integers r and s with r < m-—1 such that |ra/m— s| < 1/(m— 1); that is, 


|ra — sm| < m/(m—1) < 2. Hence ra — sm = —1,0, or 1. It cannot equal 0 or 
else m|sm = ar and (m,a) = 1 so that m|r which is impossible as r < m. Hence 
ra = +1 (mod m) and so +r is the inverse of a (mod m). 


We saw an important use of the pigeonhole principle in number theory in the 
proof of Theorem[9.1] and this idea was generalized significantly by Minkowski and 
others. Now we reprove Theorem using Dirichlet’s Theorem: 


Another proof of Theorem [If —1 is a square mod n, then n is the sum of 


two squares.| Suppose that r? = —1 (mod n). By Dirichlet’s Theorem there exists 
a positive integer b < \/n such that |—4—§| < Wa for some integer c. Multiplying 


through by bn we deduce that |a| < /n where a = rb+ cn. Now a=rb (mod n) 
and so a? +b? = r2b? +b? = (r?+1)b? =0 (mod n), and 0 < a?+b? <n+n= 2n, 
and so we must have a? +b? =n. 


For irrational ~ one might ask how the numbers {a}, {2a},..., {Na} are 
distributed in [0,1) as N — ov, for a irrational. In section [LL7] of appendix 11A 
we will show that the values are dense and even (roughly) equally distributed in 
(0,1). This ties in with the geometry of the torus and with exponential sum theory. 


The next two exercises are multidimensional generalizations of Dirichlet’s The- 
orem with not dissimilar proofs. 


Exercise 11.1.2 (Simultaneous approximation). Suppose that ai,...,a, are given real numbers. 
Prove that for any positive integer N there exists a positive integer n < N* such that, for each j 
in the range 1 < j < k, there exists an integer m; for which 


1 
jnaj—mj| < —. 
N 


Deduce that given a1,...,a, € R there exist integers gq, 1 <q < Q, and p1,...,px such that 


1 
> git t/k’ 


1 


Pl 
ll 
qiti/k 


qd 


P2 
q 


ay a2 


Exercise 11.1.3. Suppose that ai,...,a@,% are given real numbers. Prove that for any positive 
integer N there exist integers n1,n2,...,%, not all zero, with each |n;| < N, and an integer m 
for which 


Jniai+ngag+---+npag—m| < NE’ 


406 11. Rational approximations to real numbers 


11.2. Pell’s equation 


Perhaps the most researched equation in the early history of number theory is the 
so-called Pell equation? Are there non-trivial integer solutions x, y to 


a? — dy? = 1? 


(The “trivial solutions” are x = +1 and y = 0.) The best-known ancient example 
comes from comparing the number of points in triangles of points, with the number 
of points in squares of points: 


This triangle has 1+ 2+3-+4= 10 points, whereas this square has 4 x 4 = 16. In 
general a triangle with m rows has m(m+1) points, and a square with n rows has n? 
points. The numbers appearing in these two lists are mostly different, but there are 
exceptions, for example, 1, and then 36 = S = 6, and then 1225 = He = 357. 
So are there arbitrarily many “triangular numbers” that are also squares? More 
precisely, we are asking whether there are infinitely many pairs of integers m,n 
such that 
imn 1) ye 

It makes sense to clear denominators and to “complete the square” on the left side. 
Then we get 


1 
(2m +1)? =4m? 4+ 4m4+1=8:- mn aan? +1, 
Taking « = 2m+1 and y = 2n gives a solution to the Pell equation 
a? — 2y? = 1. 


On the other hand note that any solution to the Pell equation must have x odd, so 
is of the form 2m+1, which implies that 2y? = 27-1 = 1—1=0 (mod 8) and so y 
is even and therefore must be of the form 2n. (Our examples of triangular numbers 
above therefore correspond to the solutions 3? — 2-2? =1, 177-— 2-12? =1, and 
99? — 2-70? = 1 to Pell’s equation.) So we have proved that the set of triangular 
numbers that are also squares are in 1-to-1 correspondence with the positive integer 
solutions to this Pell equation. 


We will show in Theorem [11.2] that there is a non-trivial solution to Pell’s 
equation x? — dy? = 1 for every non-square integer d > 1. This was evidently 
known to Brahmagupta in India in 628 A.D., and one can guess that it was well 


2In 1657 Fermat challenged Frénicle, Brouncker, Wallis, and “all mathematicians” to create a 
method for finding solutions to Pell’s equation. Brouncker showed that he had done so by determining 
the smallest solution for d = 313, namely x = 32188120829134849, y = 1819380158564160. It seems 
that Euler attributed the equation to Pell because Rahn published an algebra book with Pell’s help in 
1658, which contained an example of this type of equation. The name stuck. 


11.2. Pell’s equation 


407 


understood by Archimedes far earlier, judging by his “Cattle Problem”: 


The Sun god’s cattle, friend, apply thy care 

to count their number, hast thou wisdom’s share. 
They grazed of old on the Thrinacian floor 

of Sic’ly’s island, herded into four, 

colour by colour: one herd white as cream, 

the next in coats glowing with ebon gleam, 
brown-skinned the third, and stained with spots the 
last. 

Each herd saw bulls in power unsurpassed, 

in ratios these: count half the ebon-hued, 

add one third more, then all the brown include; 
thus, friend, canst thou the white bulls’ number tell. 
The ebon did the brown exceed as well, 

now by a fourth and fifth part of the stained. 

To know the spotted — all bulls that remained — 
reckon again the brown bulls, and unite 

these with a sixth and seventh of the white. 
Among the cows, the tale of silver-haired 

was, when with bulls and cows of black compared, 
exactly one in three plus one in four. 

The black cows counted one in four once more, 
plus now a fifth, of the bespeckled breed 

when, bulls withal, they wandered out to feed. 
The speckled cows tallied a fifth and sixth 


of all the brown-haired, males and females mixed. 
Lastly, the brown cows numbered half a third 
and one in seven of the silver herd. 

Tell’st thou unfailingly how many head 

the Sun possessed, o friend, both bulls well-fed 
and cows of ev’ry colour — no-one will 

deny that thou hast numbers’ art and skill, 
though not yet dost thou rank among the wise. 
But come! also the foll’wing recognise. 


Whene’er the Sun god’s white bulls joined the 
black, 

their multitude would gather in a pack 

of equal length and breadth, and squarely throng 
Thrinacia’s territory broad and long. 

But when the brown bulls mingled with the flecked, 
in rows growing from one would they collect, 
forming a perfect triangle, with ne’er 

a diff’rent-coloured bull, and none to spare. 
Friend, canst thou analyse this in thy mind, 

and of these masses all the measures find, 

go forth in glory! be assured all deem 

thy wisdom in this discipline supreme! 


— from an epigram written to ERATOSTHENES of Cyrene 
by ARCHIMEDES (of Alexandria), 250 B.ch 


The first paragraph involves only linear equations. To resolve the second, one needs 
to find a non-trivial solution in integers u,v to 


u” — 609 - 7766v7 = 1. 


The smallest solution is enormous, the smallest herd having about 7.76 x 1029544 
cattle: It wasn’t until 1965 that anyone was able to write down all 206545 decimal 
digits! How did Archimedes know that the solution would be ridiculously large? 
We don’t know, though presumably he did not ask this question by chance. 


The next result, the main result of this section, presumably known to many 
ancient mathematicians, is that there is always a solution to Pell’s equation. 


Theorem 11.2. Let d > 2 be a given non-square integer. There exist integers x,y 
for which 


with y £0. Ifx1,yi yields the smallest solution in positive integers|4] then all other 
solutions are given by the recursion 
forn> 1. 


Intl =T1Iy+dyiyn and Ynt1 =L1Ynt yen 


We call the pair (a1, y,) the fundamental solution to Pell’s equation. Another way 


3 Archimedes, The Cattle Problem, in English verse by S. J. P. Hillion & H. W. Lenstra Jr., 
Mercator, Santpoort, 1999. 

*We measure the size of the solutions in positive integers x,y by the number x + Vdy, though we 
would have the same ordering if we used either x or y. 


408 11. Rational approximations to real numbers 


to write the recursion is that 
tn+Vdyn = (a, + Vdy,)” for every integer n> 1, 


where we match the coefficients of Vd on each side to determine yn, and what 
remains, the coefficients of 1 on each side, to determine xy. 


Proof. We begin by showing that there always exists a solution to 2? — dy? = 1 
in integers with y 4 0. By Corollary there exist infinitely many pairs of 
integers (m,n) such that |/d— ™| < 45. For these pairs (m,n) we have 


m2 — dn?| =n?|va—™ \vd+ = < \Va+ =| 25/44 va—-™| Sola ii, 


This implies that |m? — dn?| must be an integer < 2Vd+ 1, so there must be 
some non-zero integer r, with |r| < 2Vd +1, for which there are infinitely many 
pairs of positive integers m,n such that m? — dn? = r. Pick the smallest such r. 
We can assume that each (m,n) = 1 or else if (m,n) = g occurs infinitely often, 
then we have infinitely many solutions (m/g)? — d(n/g)? = r/g?, contradicting the 
minimality of r. 


Since there are only r? pairs of residue classes (m mod r, n mod r) there 
must be some pair of residue classes a, b such that there are infinitely many pairs of 
integers m,n for which m? — dn? =r with m=a (mod r) andn=b (mod r). Let 
m,n be the smallest such pair, and m,n any other such pair, so that m? — dn? = 
m? — dn? = r with m; = m (mod r) and ny = n (mod r). This implies that 
r|(min — nym) and 

(mym — dnyn)? — d(min — nym)? = (mj — dn?)(m? — dn?) =r’, 
so that r? divides r? + d(min—nim)? = (mym—dnyn)’, and thus r|(mim—dnin). 
Therefore x = |mym — dnin|/r and y = |m in — n,m|/r are integers for which 
2 2 
a* —dy* =1. 


Exercise 11.2.1. Show that y 4 0 using the fact that (m,n) = 1 for each such pair m,n. 


We measure the size of solutions to Pell’s equation, using the number x + Vdy. 
If x,y > 0, then this is > 1. There are four solutions associated with each solution 
in positive integers u,v, and for these we have 


u+Vdv >1>u—Vdv >0>—-u+Vdv > -1 > —u— Vado. 


Therefore x, y > 0 if and only if « + Vdy > 1. 

Let 21,41 be the solution to x? — dy? = 1 in positive integers with a; + Vdy, 
minimal. We claim that all other solutions with x, y > 0 take the form x + Vdy = 
(a, +V dy)”. If not, let x,y be the counterexample with «, y > 0 for which «+Vdy 
is smallest. Now x + Vdy >fit Vdy, since x1 + Vdy, is minimal. 

If X = a,x—dyy and Y = x1y—yi2, then X?—dY? = (x7 —dy?)(x?—dy”) = 1, 


and 
X + VdY = (x — Vdy1)(« + Vdy) = ae 
TT Y1 


which implies that 
1<X4+VdY < 2+ Vdy. 


11.2. Pell’s equation 409 


Hence X,Y > 0, and since x,y was the smallest counterexample, we deduce that 
X+VdY = (a1 + Vdy1)™ for some integer m > 1, 


and therefore x+Vdy = (21+ Vdy1)(X + VdY) = (21+ Vdy1)™*+, a contradiction. 


If we define x, + V. dyn = (a1 + Vdy,)”, then we obtain the recursion given in 
the theorem by an easy induction argument. We also deduce that the tp, yn > 0 
and so 71 <a <--- and y, < yo <--- from the recursion formulas. 


Exercise 11.2.2. Prove that if a+ V/db = x + /dy where a,b, x,y,d are integers and d is not a 
square, then a= a and b= y. 


Exercise 11.2.3. Prove, by induction, that tn42 = 2%1%n+41—2n and yn+2 = 2%1Yn+1 — Yn for 
alln > 0. 


Exercise 11.2.4. Show that all solutions to Pell’s equation (not just the positive integer solutions) 
are given by the values +(21 + Vdyi)” (not just “+”), with n € Z (not just n € N). 


For technical reasons it is actually best to develop the analogous theory for the 
solutions to 2? — dy? = +4, as in appendix 11B, when we revisit Pell equations. 


In the second half of the proof we saw how all of the solutions in positive 
integers can be generated from a fundamental solution. The proof is interesting 
in that it works by “descent”: Given a solution we find a smaller one. This is a 
technique that we saw several times in chapter 6. We will see it play a central role 
in section {11.3} and later when we study elliptic curves in chapter 17. 


The proof of Theorem [11.2] is not constructive, in that the proof does not 
indicate how to find a solution. In Lemma of appendix 11B we will show 
how to find solutions using the continued fraction for Vd (as was known to all 
of the ancient mathematicians discussed here). How large is the smallest solution 
to Pell’s equation? We saw that it can be surprisingly large, as in Archimedes’s 
cattle problem. One can prove that the smallest solution is < (8d)¥@ (sce section 
of appendix 13B). However what is surprising is that the smallest solution 
seems to usually be this large. This is not something that has been proved; indeed 
understanding the distribution of sizes of the smallest solutions to Pell’s equation 
is an outstanding open question in number theory. 

In Theorem [11.2] we saw that if d > 1 is a non-square integer, then there are 
always solutions in integers x,y > 0 to Pell’s equation x? — dy? = 1. This implies 
that 

Vdy(a — Vdy) < (a + Vdy)(« — Vdy) =1, 
and so, dividing through by Vdy?, we exhibit rational approximations x/y to Vd 
that satisfy 


pe ee 
Vd les Tape’ 
which are better approximations than those that are given by Corollary [1.1.1] 
Another issue is whether there is a solution to u? — dv? = —1, the negative 
Pell equation. Notice, for example, that 2? — 5-1? = —1. Evidently if there is a 
solution, then —1 is a square mod d, so that d has no prime factors = —1 (mod 4). 
Moreover d cannot be divisible by 4 or else u? = —1 (mod 4) which is impossible. 


We saw that «? — dy? = 1 has solutions for every non-square d > 1, and one might 


410 11. Rational approximations to real numbers 


have guessed that there would be some simple criterion to decide whether there are 
solutions to u? — dv? = —1, but there does not appear to be. For example there are 
no solutions for d = 34, 205, or 221, yet in each case there is no congruence that 
easily explains why not. This is a subject of ongoing research. We will discuss the 
negative Pell equation in the next paragraph as well as in section [11.13] of appendix 
11B. 


The case d = 5 has many fascinating properties. For example 
7-§ Pao, FH 5-P a4, 5 PK, PK 5 Hs 

at+V5y _ 
2 


All these solutions to x? — 5y? = —4 or 4 are given by = Gae\n, If 
there are solutions to x? — dy? = +4 with x, y both odd (as in this example), then 
1—d=2x? — dy? = 0 (mod 4); that is, d= 1 (mod 4). If d=1 (mod 4), then the 


proof of Theorem |L1.2|/can be used to prove there exist integers u,v > 0 such that: 


All solutions to x? — dy? = +4 with x,y > 0 are given by 
a + Vdy u+Vdv 
5 = 5 for some integer n > 1. 


To establish that there is at least one solution take x = 2r,y = 2s from a solution 
to r2 — ds? = 1 given by Theorem Now select the solution to our equation 
with utvdy > 1 but minimal. The proof of Theorem [11.2] suitably modified, then 
gives that all other solutions are given by a power of this first one. 

We call ae the fundamental solution to Pell’s equation and denote it by eq. 


Exercise 11.2.5. The smallest solution to x? — 2y? = 1 is given by (a, y) = (3,2), which implies 
that 2% and 3? are consecutive powerful numbers (integer n is powerful if p? divides n whenever 
a prime p divides n). Use the theory of the solutions to 7? — 2y? = 1 to prove that there are 
infinitely many pairs of consecutive powerful numbers. 


11.3. Descent on solutions of x? — dy? =n, d>0 


Let x1, y; be the fundamental solution to Pell’s equation, and let eg = 71 + yivd 
as in Theorem |L1.2| so that eg > 1. 


Proposition 11.3.1. Given integers d, n > 0, the integer solutions x,y to x? — 
dy? =n are all given by +e%8 for some integer k, where 


Be B:={ut+Vdv € [Vn, Vneg) : u,v > 1 and u? — dv? = n}. 


Proof. Given a solution to x? — dy? = n, let a = |x + yVd|. As eq > 1 the 
sequence of numbers 1, €g, a. ... increases to infinity, and the sequence of numbers 
1, ae an ... decreases to 0. Therefore there exists a unique integer k such that 


eb <lal/jace™. 


Let 6 := lalez*, so that \/n < B < \/neq. Therefore a is of the form +Ge%, where 
B€[J/n, /nea). Writing 8 = u+ Vdv we obtain 


u? — dv? = |(a + yWd)(x — yVd)| - ((a1 + yr Vd) (21 — yi V'd))~* 


= (2? — dy)(2i — dyf)"* =n-1-* =n. 


11.4. Transcendental numbers 411 


Moreover for a solution of r? — ds? = n where n > 0, with r,s > 0, we have 
y:i=r+sVd>Vn>n/y=r—sVd>0>-r+svd > —-r-— sv, 


so of these four closely related solutions the unique one > /n has both coordinates 
positive. In particular this implies that u,v > 0, so that 6 € B. 


For n = 1 we have B = {1}. In some questions B can be empty; in others it 
can be large. For example, there are no solutions to 2? — dy? = n in integers if n 
is not a square mod d. 


6 
In the example x? — 5y? = 209, we have e5 = (434) =9+4/5 and, after a 
brief search we discover that B = {17+ 4/5, 47+ 20/5}. 


Exercise 11.3.2. Prove that for any non-square positive integer d and integer n there is either 
no solution or infinitely many solutions to x? — dy? = n. 


11.4. Transcendental numbers 


In section B.4] we proved that Vd is irrational if d is an integer that is not the 
square of an integer. We can also prove that certain numbers are irrational simply 
by establishing how well they can be approximated by rationals: 


Proposition 11.4.1. Suppose that a is a given real number. Then a is irrational 
if and only if for every integer q > 1 there exist integers m,n such that 


1 
0<|na-—m|< -. 
q 


Proof. If a is rational, then a = p/q for some coprime integers p,q with q > 1. 
For any integers m,n we then have na — m = (np — mq)/q. Now, the value of 
np — mq is an integer = np (mod q). Hence |np—mg| = 0 or is an integer > 1, and 
therefore |na — m| = 0 or is > 1/q. 

If a is irrational, then Corollary [1.1.1] tells us that there are arbitrarily large 
coprime integers m,n for which 0 < |na—m| < 1. We select n > q to prove the 
result claimed here. 


There are several other methods to prove that numbers are irrational, but it 
is more challenging to prove that a number is transcendental, that is, that the 
number is not the root of a polynomial with integer coefficients] Next we show 
that algebraic numbers cannot be too well approximated by rationals. This suggests 
a method to identify a number as transcendental, generalizing how we identified 
irrationality in Proposition 1.4.1] 


Theorem 11.3 (Liouville’s Theorem). Suppose that a is a root of an irreducible 
polynomial f(x) € Za] of degree d > 2. There exists a constant cy > 0 (which 


5The root of a polynomial with integer coefficients is called an algebraic number. 


A412 11. Rational approximations to real numbers 


depends only on al) such that for any rational p/q with (p,q) = 1 and q > 1 we 
have 


Pp Ca 
a--| > —. 
A qt 
Proof. Since I := [a — 1,a + 1] is a closed interval, there exists a bound B > 1 


for which |f’(t)| < B for all t € I. We will prove the result with c, = 1/B. If 
p/q € I, then ja — p/q| > 1 > ca > Ca/q@ as desired. Henceforth we may assume 
that p/q € I. 

If f(a) = ye fix’ with each f; € Z, then q¢f(p/q) = ple fipiq?* e€ Z. 
Now f(p/q) #0 since f is irreducible of degree > 2 and so |q4f(p/q)| > 1. 

The mean value theorem tells us that there exists t lying between a and p/q, 
and hence in J, such that 


f(a) — f(p/q) 


f= =< 
Therefore, as f(a) = 0, 
a 2 _ If@/al , 1 _ ea 
q qf'(@)| ~ Bat qt 


Often students first learn to prove that there are transcendental numbers by 
showing that the set of real numbers is uncountable; in contrast, the set of algebraic 
numbers is countable, so the vast majority of real numbers are transcendental. 
This argument yields that most real numbers are transcendental, without actually 
constructing any! (See section [1.16] in appendix 11D.) The great advantage of 
Liouville’s Theorem is that it can be used to actually construct transcendental 
numbers. 


Corollary 11.4.1. A Liouville number is an irrational real number a such that 
for every integer n > 1 there is a rational number p/q with (p,q) =1 and q > 1 for 
which 
Pp 
ee 
q 


Every Liouville number is transcendental. 


1 
ge 


< 


Proof. Let a be a Liouville number. Suppose that a is algebraic so that there 
exist d and cg as in Liouville’s Theorem. Select n > d sufficiently large so that 
2”-4 > 1/c.. Then, selecting the approximation p/q with q > 1 as in the hypothesis 
we have 


C 
hs tl > 


1 

q” ~ ge 
by Liouville’s Theorem. Therefore 2"~4 > 1/cg > q"~4, contradicting that q > 2. 
Therefore a is not algebraic and so must be transcendental. 


®Tn this chapter there are several constants like cq which depend only on the variable given in the 
subscript. We do not attempt to be more precise about the constant because calculating a value for 
the constant will make things much more complicated, yet one will gain little from knowing its precise 
value. 


11.4. Transcendental numbers 413 


For example 
1 1 1 


~ 70 7 19% * Toe * 
is a Liouville number, since if p/q with q = 10™ is the sum of the first n terms, 
then 0 < a—p/q < 2/q"*! < 1/q”. 

Liouville numbers are easily identifiable transcendental numbers, but there are 
many transcendental numbers which are not Liouville numbers, like 7 and e. 


a 


Liouville’s Theorem has been improved to its, more or less, final form by Roth. 
To explain his result we have to introduce an € and that sort of thing: For any 
fixed « > 0 (which should be thought of as being small), there exists a constant 
K~- > 0, which depends on e¢, and is chosen so it works in the proof] In the notation 
in Roth’s Theorem we have to go a little further than this since the constant also 
depends on the value of a we need to approximate, so our constant is Cy,-, which 
depends on both a and e, but nothing else. These dependencies do restrict our use 
of the inexplicit constants cy,-; for example, one cannot compare the constants that 
arise from different values of a. 


Theorem 11.4 (Roth’s Theorem, 1955). Suppose that a is an irrational real 
algebraic number. For any fixed € > 0 there exists a constant cy,< > 0 such that 
for any rational p/q with (p,q) =1 and q>1 we have 

2 Coxe 


Ong) = ge 


The exponent “2 +e” in Roth’s Theorem cannot be improved much since if a 
is irrational, then there are infinitely many p/q with 


a- 2| < a by Corollary 
11.1.1} We will prove that approximations which are a little better than this must be 
convergents of the continued fraction of a (see Corollary [LL.10.Uin section [LL.10] of 


appendix 11B). The “worst approximable” irrational number is therefore 14V5 | for 


which the best approximations are given by F,,,1/F;, where F;, is the nth Fibonacci 
number. One can show that the difference, |4¥5 — a, is roughly 1/(/5F?) 
with an error < 1/F4. 


Exercise 11.4.1. Prove that if a € C\R, then there exists a constant 84 > 0 such that |a—p/q| > 
Bq for all rational approximations p/q. 


Exercise 11.4.2. Prove that if f(t) = ag Te att — aj), then f’(a;) = ag ae jeilQi — O;). 


There are many beautiful applications of Roth’s Theorem to Diophantine equa- 
tions. We highlight one: 


Corollary 11.4.2 (Thue-Siegel Theorem). Suppose that f(t) = a9 + ait +---+ 
aat? € Zit] is an irreducible polynomial of degree d > 3. Then for any integer A 
there are only finitely many pairs of integers m,n for which 


n? f(m/n) = agn? + ayn?-1m +--+ + aamt = A. 


7A proof that is far too involved for inclusion in this book. 


414 11. Rational approximations to real numbers 


Proof. If A = 0, the only solution is m = n = 0, as f is irreducible. So we 
may assume that |A| > 1 and write f(t) = aa iat —aj;); the a; are distinct as 
f(t) is irreducible. For any given pair of integers m,n select j so that ja; — | is 
minimized. If 7 # 7, then 


m m m 
2 |03 = —| > |; — —| + lag = —| > lar —a,| 
wi] Wa) 

n n n 


so that, since f’(a;) = aa] ]<i<a, igj(@j — 0%) (as in exercise 11.4.2), 


m| |f'(as)| m la; — a5 m 
los — F [Ppt = les - Flee TE “SZ saa TT f-F 

1<i<d 1<i<d 

fj 
_ _ jagn? + ayn?-tm +--+ +agm"| |.A| 
We now apply Roth’s Theorem with a = a; and € = 5, so that 
m Ca,;,1/2 
: > je 
a n | —~  |n|5/2 * 


Substituting this into the previous equation, then squaring both sides and multi- 
plying through by denominators, we obtain either |n| < 1 or 


|n|/2 < (|n|/2)?2- < B 
where B = 8max;(A/co,.1/2|f/(a;)|)?. Either way there are only finitely many 


possibilities for integer n, and for each such n there are at most d integers m which 
can be roots of the polynomial 


agz? +---+a;n*—lx + (apn? — A) = 0. 


This proves the claimed result. 


11.5. The abc-conjecture 


In chapter 6 we discussed various Diophantine equations with three monomials like 
x? + y? = 27, even x" + y” = 2” for any integer n > 2, and there are others of 
interest like «? — y7 = 1. So how do we determine which of these have infinitely 
many solutions in integers? This is not an easy question, and indeed the focus of 
a lot of research. One modern approach (motivated by deep considerations) is to 
study the prime powers dividing each term. 


We begin by proving the following consequence of Roth’s Theorem: 


Corollary 11.5.1. Let F(x,y) € Z[x,y| be a homogenous polynomial of degree d, 
with no repeated linear factors. For each € > 0 there exists a constant kp. > 0 such 
that for any coprime positive integers m,n: 

Fither F(m,n) =0 or |F(m,n)| > wp.|n|*-2-€. 


In other words, either F(m,n) = 0 or |F'(m,n)| is large. 


Proof. A homogenous polynomial in two variables takes the form 


d 
F(e,y) =>) aja?y??. 
j=0 


11.5. The abc-conjecture 415 


As there are no repeated factors, F(x,y) can be divisible by y but not y?. Then 
f(t) = F(t,1) as a polynomial of degree d— 1 or d (depending on whether F(z, y) 
is divisible by y or not) and has no repeated roots (as F’ has no repeated linear 
factors). 


Now if m and n are coprime integers, then either F(m,n) = 0 or, from the 
inequality in the proof of the Thue-Siegel Theorem, 
|F(m,n)| IF(ay)| 


m KF ye 
|n|@ = |F(m/n,1)| = |f(m/n)| = Qd-1 la, n | 2 alPes? 


with Kp. := min; Ca,,|f/(aj)|/24~!, where the last inequality follows from Roth’s 
Theorem} The result follows by multiplying each side through by |n|?. 


Exercise 11.5.1. Let a be an algebraic number which is a root of f(t) € Z[t], a polynomial of 
degree d. Let F(x,y) = y“f(x/y), and suppose that there exists a constant « > 0 such that 
|F(m,n)| > «|n|2-?-€ for all integers m,n. Deduce that there exists a constant c > 0 such that 
la —m/n| > c/n?+€ for all integers m,n 4 0. (Thus Corollary [11.5.J]is “equivalent” to Roth’s 
Theorem.) 


We are going to move to what seems to be a rather different question but will 
eventually tie in closely with Corollary [1.5.1] We study pairwise coprime, positive 
integer solutions to the equation 

a+b=c, 


bounding the size of a, b, and c in terms of the product of the distinct primes that 
divide a, 6, and c: 


Conjecture 11.1 (The abc-conjecture). Fixe > 0. There exists a constant k. > 0 
such that if a and b are coprime positive integers with c=a-+b, then 


II p> Kec €, 


p prime 
p divides abc 


This is the abc-conjecture, one of the great open questions of modern mathe- 
matics. 

For example, if we have a putative solution to Fermat’s Last Theorem, like 
zz” +y”" = 2” with x,y,z > 0, then we take a = x", b = y”, andc = z”. Now 
the product of the primes dividing abc = (xyz)” is the same as the product of the 
primes dividing xyz. Therefore the abc-conjecture with « = 1/5 implies for n > 5 


that 
K(2")4/5 < II p= II p<ayz< Pe ~ gre 
p prime p prime 
p divides 2” y"z" p divides ryz 


where & = k1/5, from which we deduce 2” < 1/ &°. Since 2”, y” < 2” we deduce, 
from the abc-conjecture, that in every solution to «” + y” = z” with n > 5, the 
numbers x”, y”, and z” are all bounded by some absolute constant, and therefore 


8 Vet again this seems like a lot of notation for a constant, especially an inexplicit constant, but the 
notation reflects what the constant depends on, and given the complicated derivation of this constant, 
it is certainly simpler not to try to be explicit about it. 


416 11. Rational approximations to real numbers 


there are only finitely many solutions. Therefore we have proved that the abc- 
conjecture implies that there are only finitely many solutions to 7” + y” = 2” with 
(x,y) =landn> 4. 


One can compare the abc-conjecture with the abc-theorem for polynomials (as 
in section [6.7] of appendix 6A). The size of the integers replaces the degrees of the 
polynomials; the prime divisors replace the irreducible polynomial factors. One 
cannot prove the abc-conjecture in the same way, since we relied heavily in our 
proof of the abc-theorem for polynomials on calculus, for which there is no analogy 
for numbers. 


We now state a conjecture which implies both the abc-conjecture and Corollary 
11.5.1)of Roth’s Theorem: 


Conjecture 11.2 (The abc-Roth conjecture). Let F(x,y) € Z[x,y]| be a homoge- 
nous polynomial of degree d, with no repeated linear factors. For each € > 0 there 
exists a constant Kp. > 0 such that for any coprime positive integers m,n, either 


F (m,n) =0 or 
I] = p> #relni*?*. 


p prime 
p divides F(m,n) 


The abc-Roth conjecture implies both Corollary since the product of 
the primes dividing non-zero F(m,n) is < |F'(m,n)|, and the abc-conjecture, tak- 
ing F(x, y) = xy(a+y) (since then F(a, b) = abc when a+b = c). Quite remarkably 
Conjecture [I1.2] follows from the abc-conjecture using some clever algebraic geom- 


etry. (See [2].) 


Further reading for this chapter 


1] Edward B. Burger, Diophantine olympics and world champions: Polynomials and primes down 
under, Amer. Math. Monthly 107 (2000), 822-829. 


2] Andrew Granville and Thomas J. Tucker, It’s as easy as abc, Notices Amer. Math. Soc 49 (2002), 
1224-1231. 


3] Serge Lang, Old and new conjectured Diophantine inequalities, Bull. Amer. Math. Soc. 23 (1990), 
37-75. 


4] H. W., Lenstra, Jr., Solving the Pell equation, Notices Amer. Math. Soc. 49 (2002), 182-192. 


5] Barry Mazur, Questions about powers of numbers, Notices Amer. Math. Soc. 47 (2000), no. 2, 
195-202. 


Additional exercises 


Exercise 11.6.1. Suppose (p,q) = 1 and q > 1. Determine all rationals m/n for which |2 - m| = 
1 
a 


Exercise 11.6.2. Reprove exercise[7.10.2i{a) using (11.0.1). 


Exercise 11.6.3.1 Prove that there are infinitely many solutions to the Pell equation u?—dv? = 1 
with wu = 1 (mod d). 


Exercise 11.6.4. Prove that if a is transcendental, then so is a® for every non-zero integer k. 


Questions on Diophantine approximation 417 


Exercise 11.6.5 (The “three gaps” theorem). Given a € R \ Q, we put the fractional parts 
{a}, {2a},...,{Na} © [0,1) in ascending order as 0 < {aia} < {aga} <--- < {aya} < 1 
(so that {a1,...,ay} is a reordering of {1,...,.N}). We will prove that there are at most three 
distinct values in the set of consecutive differences, D(A) := {{aj41a}—{aja}: 7 =1,...,N—1}. 
(a) Show that if {(aj41 — l)a} — {(a; — 1)a} ¢ D(A), then either a; = 1 or aj41 = 1, or there 
exists k such that {(aj; — l)a} < {a,a} < {(aj41 — 1a}. 
(b) Show that if {(a; — 1)a} < {a,a} < {(aj41 — la}, then a, = N. 
(c) Deduce from (a) and (b) that every element of D(A) equals one of {a;a}, 1 — {ana}, or 
{aia} +1 - {ana}. 


Exercise 11.6.6. Suppose that a and b are given integers, with 3 { a. 
(a) Show that we can select a congruence class r (mod 3) such that if integer m = r (mod 3), 
then «+ yV3 = (2+ V3)™(a 4+ bV3), then 3 divides y. 
(b) Deduce that if integer N can be written in the form a? — 3b? where 3 { N, then there are 
infinitely many pairs of powerful numbers that differ by exactly N. 


Exercise 11.6.7. Find an explicit value that can be used for cq in Liouville’s Theorem when 
a=vD where D > 1 is a squarefree positive integer. 


Exercise 11.6.8. Fix « > 0, and integers ao,...,a@g. Deduce from Roth’s Theorem that there 
are only finitely many pairs of coprime integers m,n for which |agn? + ayn@-!m+4+---+agm4| < 
max{|m|, |n|}4~?7¢. 

Exercise 11.6.9. Assume the abc-conjecture to show that there are only finitely many sets of 
integers x,y > 0 and p,q > 1 for which 2? — yf = 1. 


Exercise 11.6.10. Suppose that x? + y? = z” with x,y,z pairwise coprime and ‘ + : + FS <i. 
(a) Prove that 7 + i ieee 


r — 42° 
(b) Assume the abc-conjecture. Prove that there exists a constant B for which |x?|, |y%|,|z"| < 
B. 
Exercise 11.6.11. The abc-conjecture is “best possible” in that one cannot take « = 0. To 


establish this, we need to find examples of solutions to a + 6 = c in which (1/c)[],ja,-P gets 
arbitrarily small. 

(a) Prove that if m?|b, then TI pj, P < 6/m. 

(b) Prove that for any odd integer m there exists an integer n for which 2” = 1 (mod m2). 
(c)' Combine these two observations to show that for any € > 0 there exist coprime integers 


a+b=c for which Tpjave P <€c. 


Appendix 11A. Uniform 
distribution 


11.7. na mod 1 


Dirichlet’s Theorem, in section implies that na mod 1 gets arbitrarily close 
to 0 as n runs through a sequence of integers n. One might also ask whether na 
mod 1 gets arbitrarily close to any given 6 € (0,1). 


Theorem 11.5 (Kronecker’s Theorem). /f a is a real irrational number, then the 
numbers {na} are dense on [0,1). 


Proof. Fix « > 0. By Dirichlet’s Theorem there exists an integer n with ||na| < e, 
where ||t|| is the distance from ¢ to the nearest integer. As a is irrational we also 
have that ||na|| 4 0, and so {na} € (0,€) or {na} € (1—e,1). We will assume that 
{na} € (0,€) (the case with {na} € (1 — €,1) being proved analogously). 

Let 6 = {na} € (0,€). Select D to be the largest integer < 1/6 and so 


{na}, {2na},...,{Dna} = 6,26,..., Dd 


is a set of points in [0,1), consecutive points being spaced 6 < € apart. Therefore 
if 0 € [0,1), then we let k = [6/6] and so 0 — ké € [0,5), which implies that 


6 — {kna} = 0— k{na} = 6 — kd € [0,6) C [0,€). 


That is, there are integer multiples of a in R/Z that are arbitrarily close to 6. 


Exercise 11.7.1. Show that the conclusion of the theorem is not true if @ is rational. 


Exercise 11.7.2. Prove Kronecker’s Theorem when na (mod 1) € (1 —.,1). 


Now we know that if a is irrational, then na mod 1 gets arbitrarily close to any 
given 6 € [0,1), we might ask how often na mod 1 gets close to each 6 € [0,1). Are 
the values of na mod 1 roughly equidistributed? To answer this question we must 


418 


11.7. na mod 1 419 


determine how often {na} € [@ — «,@+ | for 0 € (0,1) and sufficiently small € > 0. 
If the numbers {na} are equidistributed, then we might expect the frequency to 
be roughly proportional to the length of the interval. The analogous question can 
be asked for any sequence of numbers 21, 22,... € [0,1). We say that {zn}n>1 is 
uniformly distributed mod 1 (or equidistributed mod 1) if for any a < b € [0,1), 


lim Ear <N:a< 2p < b} exists and equals b — a. 
Noo N 

The values of x (mod 1) are in 1-to-1 correspondence with the values of e(x) 
(where e(t) := e?’**) as its value depends on x (mod 1) and not on x. Moreover 
the values e(ka) for any given integer k # 0 remain consistent for x with any given 
value mod 1. That is, if = m+06 with 0 < 6 < 1, then kx = km+ko so that 
{kx} = {kd}. This suggests that to study a sequence of values x, mod 1, we might 
use Fourier analysis. This thinking leads to the famous theorem of Hermann Weyl 
(for more on this, including the proof, see Ns 


Theorem 11.6 (Weyl’s uniform distribution theorem). The sequence {tn}n>1 is 
uniformly distributed mod 1 if and only if for all non-zero integers k we have 


N 
1 
lim — S- e(kx,,) exists and equals 0. 


N- oo 
n=1 


Exercise 11.7.3. (a) Show that yo e({na}) = Ser) ae we ¢ Z, and then deduce that 


1—e(—a) 
N 
| oar e({ra})| < eRe k 
(b) Use Weyl’s uniform distribution theorem to deduce that if a is a real, irrational number, 
then {na},,>1 is uniformly distributed mod 1. 


One can prove that {na} is uniformly distributed mod 1 using fairly elementary 
ideas though it is not easy: 


Exercise 11.7.4. Let x1,22,... € [0,1) be a sequence of numbers. Suppose that there are 
arbitrarily large integers M for which 


: 1 m m+1 
lim nH {ns Ni 7 cam < 


1 
\ exists and equals —, 
M 


for0<m< M-—1. Deduce that {nr}, >1 is uniformly distributed mod 1. 


Exercise 11.7.5.' Let a be a real, irrational number. In this exercise we sketch a proof that 
{na},>1 is uniformly distributed mod 1. Fix € > 0 arbitrarily small. 
(a) Use Kronecker’s Theorem to show that there exists an integer N > 1 such that {Na} =6€ 
(0, €). 
(b) Prove that if {na} < 1—64, then {(n+ N)a} = {na}+ 6. What if {na} > 1-6? 
(c) Suppose that 0 < t < 1— 26. Show that {na} € [t,t + 6] if and only if {(n + N)a} € 
[t + 6,t + 26], and so deduce that 


l#{l<n<a:t<{na}<t+6}-#{l<n<a:t+6< {na} <t+26}|<N. 


Now let 6 = 1/M for some large integer M. 
(d)? Use (c) to show that if 0<m < M—1, then 


e{isnses% < {naj <4} < 


(e) Deduce that {na},,>1 is uniformly distributed mod 1 using exercise[11.7.4 


420 Appendix 11A. Uniform distribution 


Kronecker’s Theorem in n dimensions. In exercise [11.1.2] we saw that Dirich- 


let’s Theorem may be generalized to k dimensions; that is, given aj,,...,ax € R, 
for any € > 0 there exist infinitely many integers n such that each ||na,|| < ¢. To 
generalize Kronecker’s Theorem we would like that for 01,...,8, € R there are 


infinitely many n for which each ||na,; — 6;|| < €. However this is not true in all 
cases, even when & = 1: In the hypothesis of Theorem [L1L.5| we needed that a is 
irrational, and we showed that this is necessary in exercise [1.7.1] Another way to 
state that a is irrational is to insist that 1 and a are linearly independent over Z. 


In two dimensions we find another obstruction: Suppose that a; = a@ and 
ag =1-a. If ||na; — 6;|| < € for each j, then 


[1 + 42|| = In — A1 — || < ||na1 — A1|| + [|na2 — A9|| < 2e. 


But this should hold for any € > 0 which implies that 6; + 62 is an integer. Notice 
that in this example 1, a 1, a2 are not linearly independent over Z. 


Exercise 11.7.6. Let ai,...,ax¢,61,...,0% € R be given, and assume that there are integers 
co,---,Ce for which co + c1ai +---+cpax = 0. Suppose that for all « > O there are infinitely 
many n for which ||naj; — 0;|| < € for 7 = 1,2,...,k. Prove that c10, +---+ cpO, € Z. 


These are the only obstructions to the generalization: 


Theorem 11.7 (Kronecker’s Theorem in n dimensions). Assume that the real 
numbers 1,01,...,@,% are linearly independent over Z. Then the points 


(nay,..-,NAK)n>1 are dense in (R/Z)*. 


In other words, for any given 61,...,0% € R and any € > 0 there are infinitely many 
integers n for which ||na; — || < € for allj =1,...,k. 


This can be proved in several different ways that are accessible though tough. 
We refer the reader to sections 23.5—23.8 of |H'W08). 


11.8. Bouncing billiard balls 


Billiards, snooker, and pool are all played on a rectangular table, hitting the ball 
along the surface. The sides of the table are cushioned so that the ball bounces off 
the side at the opposite angle to which it hits. That is, if it hits at angle a°, then 
it bounces off at angle (180 — a)°. Sometimes one miscues and the ball carries on 
around the table, coming to a stop without hitting another ball. Have you ever 
wondered what would happen if there were no friction, so that the ball never stops? 
Would your ball eventually hit the ball it is supposed to hit, no matter where that 
other ball is placed? Or could it go on bouncing forever without ever getting to 
the other ball? We could rephrase this question more mathematically by supposing 
that we play on a table in the complex plane, with two sides along the x- and 
y-axes. Say the table length is @ and width is w so that it is the rectangle with 
corners at (0,0), (0,2), (w,0), (w,@). Let us suppose that the ball is hit from the 
point (u,v) along a line with slope a (that is, at an angle a from the horizontal). 


11.8. Bouncing billiard balls A421 


As the line continues on indefinitely inside the box, does it get arbitrarily close to 
every point inside the box? 


Exercise 11.8.1. Show that by rescaling with the map xz > 2/f, y > y/w we can assume, 
without any loss of generality, that the billiards table is the unit square. 


As a consequence of exercise[11.8.1] we may henceforth assume that w = @ = 1. 

The ball would run along the line £ := {(u+t,v+at), t > 0} if it did not 
hit the sides of the table. Notice though that if after each time it hit a side, we 
reflected the true trajectory through the line that represents that side, then indeed 
the ball’s trajectory would be L. 


Figure 11.1. Billiards on the complex plane and on the unit square. Follow- 
ing a path inside the fundamental domain of a lattice: The path segment ¢; 
gets mapped to 2; for j = 2,...,6. 


422 Appendix 11A. Uniform distribution 


Develop this to prove: 


Exercise 11.8.2. Show that the billiard ball is at (x,y) after time t, where x and y are given as 
follows: 

Let m = [u+t] . If m is even, let x = {w+ t}; if m is odd, let r =1— {u+ t}. 

Let n = [v+ at]. If n is even, let y = {v + at}; if n is odd, let y=1-—{v+ at}. 

Exercise 11.8.3. Show that if a is rational, then the ball eventually ends up exactly where it 
started from, and so it does not get arbitrarily close to every point on the table. 


So how close does the trajectory get to the point (r,s), where r,s € [0,1)? Let 
us consider all of those values of t for which x = r, with m and n even to simplify 
matters (with m and n as in exercise [I1.8.2), and see if we can determine whether 
y is ever close to s. 


Exercise 11.8.4. Show that [z] is even if and only if {z/2} € [0,1/2). Deduce that [z] is even 
and {z} =r if and only if {2/2} = r/2. 


Hence we want (u+t)/2 = k+r/2 for some integer k; that is, t = 2k+(r—u), k € 
Z. In that case v + at = 2ak + a(r—u) + so we want {ak + (a(r — u) + v)/2} 
close to s/2. That is, ka mod 1 should be close to @ := { eevee}, Now, 
in Kronecker’s Theorem (Theorem [I1.5) we showed that the values ka mod 1 are 
dense in [0,1) when a is irrational, and so in particular there are values of k that 
allow ka mod 1 to be arbitrarily close to 8. Hence we have proved the difficult part 
of the following corollary: 


Corollary 11.8.1. Jf a is a real irrational number, then any ball moving at angle 
a (to the coordinate axes) will eventually get arbitrarily close to any point on a 
1-by-1 billiards table. 


We finish with a challenge question to develop a similar theory of billiards 
played on a circular table! 


Exercise 11.8.5. Imagine a trajectory inside the unit circle. A ball is hit and continues indefi- 
nitely. When it hits a side at angle 6 (compared to the normal line at that point), it bounces off 
at angle —0. 

(a) Suppose that the first two points at which the ball hits the edge are at e(@) and then at 
e(8 +a). Show that the ball hits the edge at e(8 + na) for n = 0,1,2,.... 

(b) Prove that the ball falls into a repeated trajectory if and only if a is rational. 

(c) Show that if a is irrational, then the points at which the ball hits the circle edge are dense 
(ie., eventually the ball comes arbitrarily close to any point on the edge) but that it never 
hits the same edge point twice. 

(d) Prove that the ball’s trajectory never comes inside the circle of radius | cos(a/2)|. Deduce 
that the trajectory of the ball is never dense inside the unit circle. 

(e) Prove that if a is irrational, then the trajectory of the ball is dense inside the ring between 
the circle of radius | cos(a/2)| and the circle of radius 1. (The technical word for a ring is 
an annulus.) 


Appendix 11B. Continued 
fractions 


We introduced continued fractions in section [.5} We noted that the partial quo- 
tients 1, 2, q g, re of the continued fraction of 2 “vield increasingly good approx- 
imations to 85/48”. Actually more is true: These are the best approximations to 
a := 85/48 in the sense that if 


[lger]| = min |go — P|, 
then the smallest values so far in the sequence ||a|, ||2a/||,... are always given by 
||ga|| with g = qn for some n. When a = ae our approximations yield the values 
11 4 3 1 
2}=—, |4a—7)/=—, |9 16] = —, |18a—23| = —, 
jJa—2|= 75, la-W= 72, [Sa-16)= 72, [18a — 23] = 7 


and these are the best approximations because |ga — p| > |gn@— pn| for all q < dn. 


Knowing how to identify the best approximations to a given real number is 
very useful. For example if we wish to find a non-trivial solution to Pell’s equation 
p* — dq? = 1, then we note that p? = dq? and so taking square roots we have 
p = Vdq; that is, p/q must be a good approximation to Vd. In fact it is such 
a good approximation that we find every solution to Pell’s equation in positive 
integers among the partial quotients of the continued fraction for Vd. This yields a 
remarkably efficient algorithm for finding solutions to Pell’s equation (much like the 
Euclidean algorithm is such an efficient algorithm for finding gcds; the connection 
of the Euclidean algorithm to continued fractions was discussed in detail in section 
[L.5] and in section [L.8] of appendix 1A). 


We will develop these remarks in detail in this appendix. 


423 


424 Appendix 11B. Continued fractions 


11.9. Continued fractions for real numbers 


We have the following algorithm for constructing the continued fraction [ao, a1,...] 
of a given real number a, where each a; is an integer, and a, > 1 for all m > 1: 
Suppose that we have already determined integers ao,...,@m—1 and real number 
Qm such that a = [ap, @1,---,;@m—1, Am], starting with ap = a. 

e Let dm := [Am]. We will show that a, > 1 and so a», > 1 for each m > 1. 

e If an = am, then a, is an integer. Stop. 

© If Am > Gm, then let @m41 = 1/(A@m — Gm). 
Repeat this 3-step algorithm with m replaced by m+ 1. 

Now adm +1 > Qm > Gm and so if ay, > Gm, then 1 > Ay — am > 0 and 
therefore @,+41 is areal number > 1. Finally note that Qm, = @m +1/am+41 and so 
a= [ao, Q1,+++,;Am-1; Om] = [ao, Q1,+++,Am—-1,4%m; Qm+1]- 

This yields a unique continued fraction for each real number a. 


Exercise 11.9.1. Explain why if a has a finite length continued fraction, then the last term is 
an integer > 2. 


The convergents py/qn := [a0, G1, ---, Gn] for each n > 0 are defined as in 
section [1.5] and given by 


Pn Pn-1\ _ {a0 1\ faq 1 a, 1 
aay (Be Peale (m2). (He 2) 


Note that 
(11.9.2) Pn =AnPn-1+Pn—-2 and Gn =AnGn-1+Gn—-2 for all n > 2, 
so that the sequences pj, p2,... and qi, q2,..-. are increasing. Taking determinants 


yields that 


n n— -1 he 
(11.9.3) Bin, Pied 7) 


an Qn-1 Gn—19n 


for each n > 1. 


Next we show that [ao, ai, a2, ...] really does converge to a. Now a = 
[a0, @1, G2, ---, Gn; An4i], so that a = R/S where 


R Pn a Pn Pn-1 Ant+1 1 
S dn dn dn—-1 1 0}? 


R — An+1Pn T Pn-1 
S On+19dn 7 Qn-1 , 


which lies between coer and a for each n > 1, by exercise 11.9.2} 


and therefore 


(11.9.4) a= 


autAv 


Exercise 11.9.2. Show that if a,b, A, B, u,v are positive reals, then FL Be 


: a A 
lies between 5 and a 


11.9. Continued fractions for real numbers 425 


By (11.9.4) and then (11.9.3) we have 


Pn — On41Pn T Pn-1 Pn _ (=1)" 
Qn OAn41dn t+ dn—-1 In Qn(Qn+19n + dn—1) 


a 


This certainly + 0 as n — oo implying that p,/q, 4 a as n — oo; that is, 
[a9, @1, G2, ...] converges to a. Moreover this implies that a < py/qpn if n is odd, 
and a > pn/dn if n is even. 


Exercise 11.9.3. Deduce that 
po P2 |. Pa ” pL 
qo q2 q23 q2j+1 q3 q1 


and that pn/gn tends to a limit as n > oo. 


The last displayed estimate gives us very precise information about how fast 
the quotients converge to a: As a@n41 < Qn41 < Gn41 + 1 we obtain 


dn+1 < An+19n + dn—-1 < (Qn41 + Ldn + Qn—-1 = Gn+1 + dn < 2dn+1) 


yielding 
1 in 1 
(11.9.5) pee Nope 
2¢nIn+1 dn GnQn4+1 
The most famous example 7 = [3, 7,15, 1, 292,1,...], leads to the convergents 
eee <aT< ge 
106 113° 77 


Having such a large partial quotient as a4 = 292 means that the next convergent, 


peer has a far larger denominator, and therefore 
355 1 
-1077 
‘ a 113-33102 ~° 1° > 


a fantastic approximation. This was known to Archimedes in the third century 
B.cP The continued fraction for e displays an interesting pattern: 


e = (2,1,2,1,1,4,1,1,6,1,1,8,..]. 


One can generalize the notion of continued fractions to obtain 


=a 2 
2+ 5h 5+ 7, 


We will not pursue such representations here, as fun as they are. 


Exercise 11.9.4. Show how to use the continued fraction to determine, for any irrational real 
number a, arbitrarily small real numbers of the form a+ ba with a,b € Z. 


® Around 1650 B.C., ancient Egyptians approximated 7m by regular octagons obtaining 256/81, a 
method developed further by Archimedes in Greece, and then Liu Hui in China in the third century 
A.D. In 1168 B.C. the Talmudic scholar Maimonides claimed that 7 can only be known approximately, 
that is, that 7 is irrational. In the ninth century B.C. the Indian astronomer Yajnavalkya arguably gave 
the approximation 333/106 in Shatapatha Brahmana; in the 14th century A.D., Madhava of the Kerala 
school in India showed how to get arbitrarily good approximations to 7. 


426 Appendix 11B. Continued fractions 


11.10. How good are these approximations? 


Equation shows that the convergents of the continued fraction give excellent 
approximations to a. Lagrange showed that one cannot really do better in that 
ming<gq ||qa| (where ||t|| denotes the distance from t to the nearest integer) is always 
attained by q = qn, and no other q, which follows from the next result. 


Theorem 11.8. If 1 <q < qn41, then |qra— pn| < |ga— pl, with equality only if 
qd = dn and p =Pn- 


Proof. Let « = (—1)"(qpn+4i — Pdn41) and y = (—1)"(apn — pdn), so that 
Pn&—Pntiy=p and G®—dn4iy=4 


aS GnPnt1—PnQn+1 = (—1)”. We observe that « 4 0 or else g,+1 divides qQn41y = —q 
so that dnii < g, contradicting the hypothesis. We may assume that y 4 0 or else 
P = LPn, 4 = Ldn and the result follows. 


Now dnZ = dn4iyt+q where g < dn41 < |dn+iy|, and so dnz and gn+1y have the 
same sign, and therefore x and y have the same sign. We saw earlier that gna — Dn 
and gn+1@— Pn+i have opposite signs, and so x(qna@— pr) and y(qn41 — Pn+1) 
have opposite signs. Now qa — p = #(qn@ — Pn) — Y(Gn41@ — Pn41) and so 


laa — p| = |2(Gna — pn)| + |y(Gn410 — Pn4i)| > [dn — Pri. 


The result follows. 


Exercise 11.10.1. Deduce that if 1 < q < qn, then le - om 


< Ja- 2. 
q 


We next show that if p/q is a good enough approximation to a, then it must 
be a convergent. 


Corollary 11.10.1. Jf la 2| < a then : is a convergent for a. 


Proof. If g, <q < dn41, then |qna — p,| < 1/2q by Theorem [I1.8] Hence p/gq = 
Pn/Qn or else 


Pn 
In 


ae Lid 
2g? 2ddn ~ 94n’ 


1 2 Pn 


< 
qdn qd dn 


a contradiction. 


Exercise 11.10.2. Show that if -Vd+4 < p?—dq? < Vd with p,q > 1, then p/q is a convergent 
in the continued fraction for Vd. 


Exercise 11.10.3. Show that if d = 1 (mod 4) and ee < p?—pq4 (44 < va with 


p,q = 1, then p/q is a convergent in the continued fraction for aes 


In 1904 Fatou showed that if la - 2| < a then we cannot quite deduce that 


p/q is a convergent but we can come close, deducing that there exists an integer n 
for which 


Pn Pn — Pn-1 Pn + Pn-1 
== OF OF OS 


dn Gn — In-1 Gn T Gn-1 


11.11. Periodic continued fractions and Pell’s equation 427 


At least every other convergent yields as good an approximation as required in 
Corollary (11.10.1 


Lemma 11.10.1. The inequality Jo - 2| < 2a? is satisfied for at least one of 


P — Pn or Prt) for each n > 0. 
Qn+1 


qn 
Proof. If not, then, since a — . and a — rae have opposite signs, 
1 ae she 1 1 
= Pn Pnt+1 23% Pn 4 | Pn+1 S 5 sik 5 
GnQn+1 dn Qn4+1 dn Qn+1 2q5, 20744 


which is false for any distinct reals qn, dn+1- 


Hurwitz showed that for at least one of every three convergents one can im- 
prove this to < 1/(/5q?). However this is the best possible in general since 
the convergents for 14+v5 are Fret and, as we will prove in the next exercise, 


2/14/75 Fata 
i, 2 Fn 


1 
V5 


> as 2 —> ©. 


Exercise 11.10.4. Show that 14+v8 = [1,1,1,1,...] and so the convergents are F;,41/Fn where 
Fy, is the nth Fibonacci numbers. By using the general formula for Fibonacci numbers, determine 
how good these approximations are; i.e., prove a strong version of the formula in section|LL.4 


1+V5 Fati , (-1)” 
2 Fr | V5F2 


1 
~ 5FS 


11.11. Periodic continued fractions and Pell’s equation 


Given a solution to Pell’s equation, p?—dq? = +4 with p,q > 0, we have Vd+p/q > 
Vd so that 


P |p? — dq?| 4 
ari P(Vd+ pa) Vag? 
If d > 64, then this < 1/2q? and so p/q is a convergent for Vd by Corollary [1.10.1] 
What about in the other direction? If p/q is a convergent in the continued 
fraction Vd, then |Vd—p/q| < 1/@ and therefore |Vd+ p/q| < |\Vd—p/q|+2Vd < 
2Vd+1. Hence 


Ip? — dq*| = @° 


vi-2 vi+?| <2Vd+1. 


Not quite a solution to Pell’s equation but it at least guarantees that p? — dq? is a 
small integer in absolute value. 

In this section we study the continued fractions of quadratic irrationals. We 
will prove that, like the continued fraction for 145 = [1,1,1...], they are all 
eventually periodic: We say that [ao,a1,...] is periodic with period ao,...,@m—1 
where m > 1 if Qnim = Gy for all n > 0, and write 


(Go, QA1, «++, Gm—i\- 


Lemma 11.11.1. Ifa has a periodic continued fraction, then a is a real quadratic 
irrational. 


428 Appendix 11B. Continued fractions 


For example if a = [a], then a =a+ i, which implies a? — aa — 1 = 0 so that 
a = (a+ Vd)/2, where d = a? + 4. Therefore a? — d- 1? = —4, a solution to the 
Pell equation for d. 


Exercise 11.11.1. What numbers have continued fraction [a, b,a,b,a,...] where a and b are 
integers > 1? Can you use this to find solutions to a family of Pell equations? 


Proof. As a has a periodic continued fraction, with period length m, say, then 
Am = aand a= [ao, .--; @m—1, @] which, as above, implies that 

APm—-1 T Pm—-2 

Adm—-1 T Im—2 

Multiplying through by the denominator we find that a satisfies the quadratic 
equation 


(11.11.1) a = 


(11.11.2) dm—10" + (Gm—2 — Pm—1)@ — Pm—2 = 0. 


Therefore a is quadratic irrational, for it cannot be rational or else the continued 
fraction would have finite length (as in section [1.5). 


Lemma 11.11.2. [fa =u+ vv d, with d squarefree and u,v > 0, has a periodic 
continued fraction of period length m, then we have positive integer solutions Un, Un 
to Pell’s equation 
u2 — dv? = 4(-1)" 
whenever m divides n, taking 
Un = In—2 + Pn-1 and Un = 2qn—10- 


If we write én = 5 (Un + Vdun), then €, = €&, where n = mk for all integers k > 1. 


In the next section we will prove that any a of the form a = u+ vVd has an 
“eventually” periodic continued fraction. 


Proof. Now (L1L.11.1) is satisfied, with n replaced by m, whenever n is a multiple 
of m. Solving (LL.11.2) and noting that a is the positive root, we find that 


Pn-1 — In-2 + V(n—2 _ Pn—1)? + 4pn—29n-1 

2¢n—1 
Multiplying through by the denominator and comparing the square of the square 
root term on each side, we obtain 


(11.11.38) uwtovd =a = 


(dn—2 + Pn—1)” — 4(—1)” _ (dn—2 = Pn—1)" + Apn—29n—-1 = d(2qn—10)?. 
Since the left side is an integer, so is the right side, and the first part of the result 
follows. 


The matrix te a has eigenvalues €, and €&, = $ (Un — Vdu,). By 
n-1 n—-2 


we see that if mn = mk, then 


te ) _ Gee el 
dn-1 Qn-2 a Am-1 Am—2 , 
and therefore, comparing eigenvalues and noting that un,v, > 0, we deduce that 
é¢ =e%, for all k > 1. 


11.12. Quadratic irrationals and periodic continued fractions 429 


One can show that these generate all the solutions to Pell’s equation. Moreover 
we see that there is a solution to the negative Pell equation only if m is odd. 


11.12. Quadratic irrationals and periodic continued fractions 


In the previous section we worked with periodic continued fractions to find solutions 
to Pell’s equation, but is there really a periodic continued fraction for each Vd? 
In this subsection we develop a theory that allows us to easily determine whether 
a given number of the form u + vV/d, with u,v rational, has a periodic continued 
fraction. 


We say that a = [ao,a1,...] is eventually periodic if dn4m = Qn for all n > 1, 
for some integer r > 0, and we write a = [ao, ..., G@p—1,G, ---, Grtm-—i|- This a is 
a real quadratic irrational, since 7 := [@,, ..-, G@+m-—1| is real quadratic irrational 


by Lemma|L1.11.1} and 


(11.12.1) oo ee 
Y Gr—1 T Gr—2 


Theorem 11.9. The continued fraction of a quadratic irrational real number is 
eventually periodic. 


Proof. Suppose that a has minimal polynomial ax? + br + ¢ = a(x — a)(x — 8), 
and define 


f(a,y) = ax? + bay + cy? =(« y) G _ (") 


c y 


By (LL.11.1), (‘) =K & ay ee for some k # 0, and so if we define 


An Bn/2\ _ ( Pn Qn a b/2\ (Pr Pn-1 
B,,/2 Ch , Pn-1 Qn-1 b/2 c dn Qn-1 
(so that d:= b? — 4ac = B2 — 4A,,C,, by taking determinants of both sides), then 


An  Bn/2\ (an 
An@44 + Brongi t Cr = (n+ 1) ts . ) ( ) 


mle DGPS) (jee) (Bt) Ca") 
=? (a 1) & ) @ = x? f(a,1) =0. 


Therefore f,(z) := Anz? + Bnx + Cp has root dn41. Now An = f(Pn, qn) and 
Cy = Ay-1. By (11.9.5), 
Pn 

0 
dn 
so that, for all n > 1, we have 


a— Pn} < 4+ <1, and 
In Gn 


Pn 


<|B-al4 
dn 


<|B-—a|+1, 


Pn 


|An| =|f(Pnsan)| = ag2 la < a(|6 —o|+1) =a4+ Va. 


gas 
q 


n 


430 Appendix 11B. Continued fractions 


Since the A, are all integers, there are only finitely many possibilities for the 
values of A, and C;,, once n > 2. Moreover, given these values there are only 
two possibilities for B,, as B? = d+4A,C,. Hence there are only finitely many 
possible polynomials f,,(a2) and each corresponds to at most two roots, so one such 
root must repeat infinitely often. That is, there exists m < n for which a», = dn. 
The continued fraction for Qm is [@m,Qm+41,---;@n—1, Qn], and SO Am, is periodic 
with period n — m. 


Exercise 11.12.1. Deduce that the continued fraction for a is eventually periodic. 
Moreover if n > 2, then B? = d+4A,C, < d+ 4(a+ Vd)? so that |B,| < 


2a+3Vd. Now an = [An] and Qn41 = =Batvd Therefore, as each |A,,| > 1, we 
have 


Gn < Boal Vd 


(1112.9) <a+2Vd_ for alln>3. 


Exercise 11.12.2. Prove that if a= 1, then an < V/d+1 for alln > 3 in G@L12.9). 


We have shown that a has an eventually periodic continued fraction if and 
only if a is a quadratic irrational real number. However to apply Lemma [I1.11.2] 
we need a to have a periodic continued fraction. In the next result we give an easily 
verified criterion to determine whether a given a has a periodic continued fraction. 
We define the conjugate of u + ovd to be u— ovd. 


Proposition 11.12.1. Suppose a is a real quadratic irrational number with con- 
jugate —B. Then a has a periodic continued fraction if and only ifa>1> 6>0. 


Let d > 1 be an integer that is not a square. Let a = [Vd] + Vd, which is 
obviously > 1. Moreover a has conjugate —(vVd — [Vd]) = —{Vd}, and {Vd} lies 
in (0,1). Therefore a has a periodic continued fraction, which means we can use 
Lemma |L1.11.2}in order to find the solutions to Pell’s equation for d. 


Proof. By Theorem [11.9] the continued fraction of a is eventually periodic. 


Assume that a > 1> 6 > 0. This implies that a, >a, > 1 for all n > 0, and 
we now show that 0 < 6, < 1 for all n > 0 by induction. It is true for n = 0 as 
Go = B and so now take n > 1. Since an-1 = Gn-1 + 1/am by definition, we have 
—Bn—1 = An-1—1/Bn by taking conjugates. This means that an_1 = 1/Bn — Bn—1 
is an integer in (1/8, —1,1/8,) by the induction hypothesis, and so an—1 = [1/Bn] 
and hence 1/8, > 1 implying that 0 < 8, < 1 as required. Since the continued 
fraction for a is eventually periodic, there exists 0 < m <n with am = an; select 
m to be the minimal integer > 0 for which such an n exists. Then m = 0 or else 
taking conjugates gives Bn = Bn, so that am—1 = [1/8m] = [1/8n] = an—1 and 
hence Q@m—1 = Qm—1 + 1/Am = Gn—-1 + 1/an = Qn—1, contradicting the minimality 
of m. We have therefore proved that a has a periodic continued fraction. 

On the other hand if the continued fraction of a is purely periodic of period 
length n, then a > a9 = Gn > 1. Let f(x) := dn—127 + (Qn—2 — Pn—1)£ — Pn—2 = 0 
which has roots x = a and —£, as noted above. Now f(0) = —pn-2 < 0 and 
f(-1) = (dn—1 — In—2) + (Pn—1 — Pn—2) > 0, and so f has a root in (—1,0) by 


11.13. Solutions to Pell’s equation from a well-selected continued fraction 431 


the intermediate value theorem. This root cannot be a@ which is > 1 so it must be 
27 


Exercise 11.12.3. Prove that vd + [Vd] has a periodic continued fraction for any squarefree 
integer d > 1. 


Lemma 11.12.1. If a= [a@, ..., Gm—i], then the conjugate of a is —8 where 
B = [emis GAm—-2; +++ ao]. 


If we write a = u+vV4d, then u,v > 0 with v/d +u>1>vVd—u>0, by 
Proposition 


Proof. Let 6 = 1/y where y = [a@m—1, @m—2; ---, Go, 7], and 
Am—-1 1 ao 1\ [fa 1 Am—1 1 r _ (Pm-1 Um-1 
1 oO}; 1 0} 1 0} - 1 0 ~ Pm-2 WUm-2 ; 
so that 


YPm—-1 7 Im-1 

Y Pm—2 T Im—2 ; 
Multiplying through by the denominator this implies that —6 satisfies the same 
quadratic equation (T1112) as a, and they must be distinct roots as a > 0 > —6d, 
and so 6 = £. 


PY 


11.13. Solutions to Pell’s equation from a well-selected continued 
fraction 


The above results combine to show that the continued fraction of Vd has some very 
special structure: 


Corollary 11.13.1. [fd > 1 ts an integer but not a square then one can write 
Vd= [bo, 61,---, bm] where b; = bm; for j =1,2,...,m—1, 


and by = [Vd] with bm = 2b 9. Let m be the smallest such integer, and suppose that 


the nth convergent of the continued fraction is P,/Qn. The fundamental solution 


to the Pell equation x? — dy? = +1 is given by (x,y) = (Pm—1;Qm-—1), with 


Pai a di -1 = (-1)™. 
All other solutions to Pell’s equation with x,y > 0 are given by 
Ps a dQn-1 = (-1)” 
whenever m divides n, so thatn = mk for some integer k > 1, and 


Pr-1 =: VdQn—1 = (Pin-1 at VdQm—1)*. 


Proof. Let a = bp + Vd, which has conjugate —G where B = Vd — bp = a — 2bp. 
As ao = 2b9, Lemma [11.12.1] then implies that b; = aj; = Gm—j; = bm, for all 
j =1,2,...,m—1. Moreover by, = am = 2bo. 


By definition the nth convergent in the continued fraction for @ is pp/dn = 
bo + Pn/Qn, and so Qn = qn and P, = pr — bogn for all n > 1. We apply 
Lemma [i1.11.2] Now (1.113) implies that v = 1 so that v, = 2qn—1 = 2Qn-1, 


432 Appendix 11B. Continued fractions 


and 2qn—1b9 = Pn—1 — Gn—2, 80 that Py, = bo Qn—1 + Qn—2, and therefore uy, = 
Pn—-1 + dn—2 = Pr-1 + bo Qn-1 Tr Qn—2 = 2P,,-1- Therefore | —— a 4 = (—1)” 
whenever m divides n, by Lemma [I1.11.2] 


Exercise 11.13.1. Prove that if b,4; = b; for all 7 > 1, then m divides n. 


Now suppose that x? — dy? = +1 with 2, y positive integers. If d > 2, then, by 
exercise [11.10.2] we see that x/y is a convergent in the continued fraction for Vd, 
say py—1/Qr—1 for some r > 2. If we write Vd= [bo,.--, br—1, y], then 


Vad = Pr-17 - Pr—-2 
Gr-1) + Gr—2 
so that (pp—1— Vdqr—1)7 = —(pr—2— Vdgy—2). Multiplying this through by p,—1+ 
Vdq,p—1, we obtain 
(-1)"y= (pe_4 = dq?_4)y = —(pr_2 — Vdqr—2)(Pr—1 a Vdqr—1) 


= (p; 19r—2 — Pr—24r 1)Vd + (dp, 19r—2 — Pr—1Pr 2), 


so that y = Vd+c for some integer c. By definition y has a periodic continued 
fraction and so by exercise [[1.12.3]we know that c = bo. This implies that b, = 2b9 
and b,+; = b; for all 7 > 1. Therefore m divides r by exercise [1.13-1] The 
result follows (other than this last part when d = 2 which follows from finding the 
fundamental unit 17—2-1? = —1 from the first convergent of the continued fraction 


J? = (1, 2). 


We deduce from Corollary [1.13.1] that there is a solution to the negative Pell 
equation x?—dy? = —1 if and only if m, the minimum period length of the continued 
fraction, is odd. 


Here are some examples of the continued fraction for Vd: 
f= (1, 2), V3 = [1,1 2), V/s 4), 9/6 = [2 2, a, V7 = 2, TL, 1 4 
V8 = [2, I, 4), 4/10 = (3, Gl, 4/11 = (3, 3, bly Vv 12 = [8, 2,5, 
af 18 = [oy Uy, ay 1G), a Se, A Be Te Gy ene 


Next we give only the longest continued fractions and the largest fundamental 
solutions for discriminants up to 135: 


V2=(1,2), 2 —2:1? =<1, 
V3=(1,1,2], 22-3-1?=1, 
V6 =([2,2,4, 5°-6-2?=1, 
V7 =? LL LA) Ste =, 
V13 = [3,1,1,1,1,6], 18?- 13-5? =-1, 
V19 = [4,2,1,3,1,2,8], 1707-19-39? =1, 
V2 = [4,1,2,4,2,1,8], 1977-22-42? =1, 
V31 = [5,1,1,3,5,3,1,1,10], 1520? — 31-273? =1, 
/43 = |6, 1,1,3,1,5,1)5)1,5 12), 34827 — 43-531? =1, 


11.13. Solutions to Pell’s equation from a well-selected continued fraction 433 


/46 = (6, 1,3,1,1,2,6,2,1,1,3,1,12], 24335? — 46-3588? = 1, 
/76 = [8,. 1,2, 1,1, 5, 4,5, 1,1, 2,1, 16], 577997 — 76-6630? = 1, 


/94 = [9, 1,2,3,1,1,5,1,8,1,5,1,1,3,2,1,18 
4/124 = (11, 7, 2, 1, ly 1,8) 1, 4 1,3) 1, 1,2, 7,22 
133 = [11, 1,1,7,5,1,1,1,2,1,1,1,5,7,1,1,22 


, 21432957 — 94. 221064? = 1, 


, 4620799? — 124 - 414960? = 1, 


, 25885997 — 133 . 224460? = 1. 


Now we give the largest fundamental solutions, and their continued fraction lengths, 
for discriminants up to 750: 


Length = 18: 77563250? — 139 - 65788297 = 1, 

Length = 20: 1728148040? — 151 - 140634693” = 1, 

Length = 22: 1700902565? — 166 - 132015642? = 1, 

Length = 26: 2783543736507 — 211 - 191627053537 = 1, 

Length = 26: 695359189925? — 214 - 47533775646? = 1, 

Length = 26: 5883392537695? — 301 - 339113108232? = 1, 

Length = 34: 2785589801443970? — 331 - 153109862634573? = 1, 

Length = 37: 44042445696821418" — 421 - 21464974635307857 = — 

Length = 40: 84056091546952933775? — 526 - 36650197573242955327 = 1, 
Length = 42: 181124355061630786130? — 571 - 7579818350628982587" = 1, 
Length = 44: 59729912963116831997 — 604 - 2430375690639517207 = 1, 
Length = 48: 489615753129986500355607 — 631 - 19491295375751510364277 = 1. 


The period length of the continued fractions of the Vd seems to mostly be around 
size 2V/d, and the size of the fundamental solutions mostly seems to be around 104, 
We believe that something like these estimates are true in general, extrapolating 
from this computational evidence, though these claims are not yet proven. There 
are some “thin” classes of discriminants for which the period length of the continued 
fractions and the fundamental solutions are much smaller. (For example, if d = 
m? +1, then Q(Vd) has the fundamental solution m+ Vd to Pell’s equation, and 
Vd = [m, 2m] has periodic length one, for all integers m > 1.) 


Corollary 11.13.2 (Relation between the continued fraction length and the size 
of the fundamental solution). Suppose that Vd has a continued fraction that is 
eventually periodic of period m. Let €¢g = u+vuVvd where u,v give the first solution 
to u? — dv? = 1 as in Corollary Then (BAS re < eg 2 Wd po). 


Proof. Let 7, := P, + VdQ, for all r > 0, so that 7) = bo + Vd and Th = 
baTn + T-—1 for all n > 1, starting with 7-; = 1. Since the minimum polynomial 
for Vd is x? — d, exercise [1.12.2] implies that each b, < Vd +1 (for n = 1,2 we 


434 Appendix 11B. Continued fractions 


deduce this as by, = bn4m). The T, are obviously increasing, and so 
i <Oe-4+ ia oo < Wd ry <a or. 


The upper bound follows as €g = Tm—1 by Corollary [11.13.1} The lower bound 
follows from noting that T, > F,+3 for n = —1,0, and then by induction for all n, 


and s0 Tm—1 > Fin42 > (6 he, 


In section [11.2] we saw that the fundamental units can be of the form utuvd 


with wu and v odd when d = 1 (mod 4). To find these we prove the analogy to 
Corollary for the continued fraction of Ltd (in place of the continued 
fraction of Vd). 


Corollary 11.13.3. [fd > 1 is =1 (mod 4) but is not a square, then 
1+vd 
2 


and bm = 2b9 — 1 is the largest odd integer < Vd. Let m be the smallest such 
integer, and suppose that the nth convergent of the continued fraction is Pp /Qn.- 
The fundamental solution to the Pell equation x? — dy? = +4 is given by (x,y) = 
(2Pm 1. = Qm 1,Qm 1); with 


(2Pm—1 ~~ (aa _ dQ?,-1 — 4-1), 


which can be rewritten as 


l-d 
Te 4 _ Pr-1Qm-1 + (+) ae = ee as 


All other solutions to Pell’s equation with x,y > 0 are given by 
(2Pp—1 — Qual = dQ?_4 = 4(—1)" 


whenever m divides n, so thatn = mk for some integer k > 1, and 


ae (: ) — (Pa 7 (: ) 2n-1] . 


Proof. Let ao be the largest odd integer < Vd and a = aoe so that [a] = ao. 


= [bo, b1,..., bm] where bj = bm; for j =1,2,...,m—1, 


Now a has conjugate —6 where 8 = a—ag € (0,1), and so a = [@, ..., Gm—i| with 
B= [0,@m—1, G@m—2, ---, Go] by Lemma[L1.12.1]. Equating 8 = wee =a-—aj = 
[0,a7, ..-, Gm], we see that a; = Gm_,; for 7 =0,1,...,m. Now ce =a ae 
and so bo = Hust and b; = a, for all j > 1. In particular b,, = am, = a9 = 2b — 1. 


By definition py /dn = (bo -1)+Pr/Qn, and so Qn = dn and Py, = pn—(bo—-1)an 
for alln > 1. We apply Lemma [11.11.2} so (LL11.3) implies that 2v = 1 with 
Un = In-1 = Qn-1 and In—1(2bo _ 1) = Qn—-14% = Pn—-1 — In—2, SO that Phi = 
boQn—-1 + Qn—2. Therefore 


Un = Pn 1+4n 2=Ph 1+ (bo — 1Qn-1 + Qn—2 = 2Pr—1 — Qn-1. 


Now suppose that x? — dy? = +4 with x,y > 0, so that x = y (mod 2). If 
d > 9, then take p = “4% and q = y in exercise [1110.3] we see that p/q is a 


11.13. Solutions to Pell’s equation from a well-selected continued fraction 435 


i 


convergent in the continued fraction for + , Say Pr—1/Gr—1 for some r > 2. If we 


write 1ivd — = [bo,.--,bp—1, 7], then 


L+Vd _ Pri +Pr-2 
2 Gr—1Y + Gr—2 
Proceeding analogously to the proof of Corollary [113-1] we deduce that y = 


1tvd +c for some integer c. By definition y has a periodic continued fraction and 
so we must have c = bo — 1. This implies that 6, = 2b) — 1 and b,,; = 6; for 
all j > 1. Therefore m divides r by exercise [1.13.1] The result follows (other 
than this last part when d = 5 which follows from finding the fundamental unit 
1?—5-1? = —4 from the first convergent of the continued fraction = = (i): 


The first few examples of the continued fraction for wd 


14+V5 _ = [I], Itvi3 — 31, 14vi7 — 12, 1 


are 


feat = 0,4, a), 


o] 


[2, 
Ay28 (3, 5], Ss = |B) 2, 1, 2, 3h, 


i+V4i _ [3, 1, 2, 2, 1, 5], ee 1, Bliss 


Now we give only the longest continued fractions: 


= = [4,3,1,1,1,3,7], se = [4,1,3,2,1,1,2,3,1,7], 

a = [5,2,2,1,4,4,1, 2,2, 9], = = (6,5,1,1,2,3,2,1,1,5,11], 
cand = (6, 1,5, 2,2, 1,2,2,5,1,11], 

ee 7,6,1,1,2,1,3,1,2,1,1,6, 13], 

a 7,2,4,6,1,2,1,1,1,1,2,1,6,4,2, 13), 

iL 7,1,6,2,3,4,1,1,1,1,1,4,3,2,6, 1,13], 

= = (8,3,1,4,2,2,1,1,1,7,7,1,1,1,2,2,41,3,15), 

ee = (9,2,1,8,5,1,3,1,1,2,2,1,1,3,1,5,8,1,2, 17], 

= = (9,1,2,8,1,5,4,2,2,1,1,1,1,2,2,4,5,1,8,2,1,17, 

sys = (10,1,1,1,1,2,1,3,3,9,1,4,6,1,1,6,4,1,9,3,3,1,2,1,1,1,1, 19. 


The period in this last continued fraction has length 27, and so the fundamental 
solution to Pell’s equation is 


1119217969687 — 409 - 55341766857 = — 


436 Appendix 11B. Continued fractions 


11.14. Sums of two squares from continued fractions 


Corollary 11.14.1. The continued fraction btvd = [G, .--, Gm—1] 1s symmetric 


(that is, a; = @m—1—-; for j =0,...,m—1) if and only if a? +b? =d. 


Proof. Ifa = btvd = |G, ..-) Gm—1| is symmetric, with conjugate 6, then aG = 1 
by Lemma|L1.12.1} Therefore 


P—d_ b+Vd b-Vvad 


2 


=a-(—$)=-1,; 


a a a 


that is, a? + b? = d. 

On the other hand if a2 + 6? = d with a,b > 0, then let a = +“4. Now 
a < Vd so that a > 1. Moreover — is the conjugate of a where 3 = va—b As 
b< Vd < a+b we deduce that 0 < 6B <1. Therefore a has a periodic continued 
fraction by Proposition Now a = 1/6 and so the continued fraction is 
symmetric by Lemma [IL.12.1] 


Proposition 11.14.1. Suppose that A 4 0 and B are integers for which A divides 
B? —d and B+¥4 = [bg,bi,..., baegi] where bj = beegi-y for j = 1,2,...,2k. 
Then 


[be41, be+2, sey bor, bok+1) by, see be-1, bg] — 


b+vd 
a 
for some integers a #0 and b for which a? + b? = d. 


Proof. If rivd = [bo, b1,...] where r and s are integers for which s divides r? — d, 
then, for R = sbo — r and S = (d— R?)/s, we have 


tu=1/ (“*) Seinaes 


S S 


We note that S is an integer since d— R? = d—r? = 0 (mod s), and that S 
divides R? — d as d— R? = 8S. Iterating this we see that [bp41, dx+42,...] takes 


the form btvd as claimed. Now this continued fraction is also purely periodic and 


symmetric, which implies that a? + b? = d by Corollary L141 


For examples, 14V18 = [2,3] and so [3] = S47 18 | and V13 = [3,1,1,1,1,6] and 
(1, 1, 6, 1, 1] = 248. Either way we find that 13 = 2? + 3?. 
Typically we apply Proposition [L141] with the continued fractions for Vd or 


aes which works provided there is a solution to the negative Pell equation (as 
the period in Proposition [1.14.1] must be odd). 


Modern uses of continued fractions 437 


Additional exercises. 


Exercise 11.14.1. The number ¢ = 1+Vv5 satisfies the equation ¢ = 1+ 1/¢. 


(a) Iterate this to obtain ¢ = 1+1/(1+1/¢), and then reprove the first part of exercise[11.10.4 
(b) Show that if a is the positive root of x? — ax — b = 0 where a and b are positive integers, 
then 


Exercise 11.14.2. Let d= 14-5 | 
(a) Show that ¢= /1+ 4. 
(b) Iterate this to obtain ¢ = \/1+/1+4 4, and then prove that 


c) Show that if a is the positive root of x? — az — b = 0 where a and b are positive integers, 
19) g 


then 
nays t als ta\/b+aVb+--:. 


Exercise 11.14.3 (Hermite, Serret 1848). We give an efficient way to determine, for a given 


prime p = 1 (mod 4), integers a and b for which p = a? + b?: Let r?2 = —1 (mod p) with 
0<r< p/2, and write p/r = [ao,..., an]. We will show that n = 2m + 1 is odd and a and b can 
be determined from a/c = [ao,a1,...,@m] and b/d = [ao, a1,..., @m—1]. 


(a) Use that an > 2 to deduce that 0 < s < p/2 where s/é = [ao, a1, 
(b) Prove that s =r and n = 2m +1 for some integer m. 

(c) Show that a; = a,j; for j =0,1,...,n. 

(d) Show that p = a? + b?. 


iy a4: 


Modern uses of continued fractions 


[1] W. Duke, Continued fractions and modular functions, Bull. Amer. Math. Soc. 42 (2005), 137-162. 


Appendix 11C. Two-variable 
quadratic equations 


11.15. Integer solutions to 2-variable quadratics 


Theorem 11.10. Jf there is one solution in integers to 
ax” + bry + cy? + dx+ey+ f =0, 
where a,b,c,d,e, f € Z, then there are infinitely many, except perhaps when a = 


b=c=0 or when b? — 4ac = 0. 


Proof. If a 4 0, we multiply through by 4a to complete the square so that if 
z = 2ax + by + d, then 
2? = py? +2aytr 
where p= b?—4ac#0, q=bd—2ae, and r=d? —4af. 
We multiply through by p to complete the square so that if w = py+q, then 
w? — pz* = q* — pr. 
Taking any solution to Pell’s equation u? — pv? = 1 with u = 1 (mod p) from 
exercise [11.6.3] we create a new solution to our equation by taking 
W=wut+pvz and Z=uz+wv, sinceW+ /pZ = (ut ./pv)(w + pz), 


which yields a new integer solution X,Y to the original equation given by 


—1 
X = (u— bv)x — 2cvy — ev + (u ‘the 2cd), 
Pp 


(u= 1) 


Y = (u+ bu)y + 2ava + du + (bd — 2ae). 


If a = 0 but c £ 0, we swap the roles of the variables x and y. If a = c = 0, then 
b £0 and we replace y by x + y. 


Exercise 11.15.1. What happens in the exceptional cases a = b = c = 0 and b? — 4ac = 0? 


438 


Appendix 11D. 
Transcendental numbers 


11.16. Diagonalization 


In section [11.4] we exhibited a transcendental number and proved that it is tran- 
scendental using Liouville’s Theorem. In general it is difficult to prove that a given 
number is transcendental (which is why the Liouville number exhibited in section 
[11.4] looks so obscure) though one expects that any number that is not rather ob- 
viously algebraic will be transcendental. We now give Cantor’s “diagonalization” 
proof that there are far more transcendental numbers than algebraic (after appre- 
ciating how one type of infinity can be larger than another): 


A set S is said to be countable if one can assign an order to the elements of 
S, say $1,82,.--,5n,---, using the positive integers as indices, and this sequence 
includes all of the elements of S. It is evident that the integers are countable (one 
can order them as 0,1,—1,2,—2,...), and also the rationals (one can order them 
by listing all p/q with q > 0 and (p,q) = 1 such that |p| + |g] = 1, then such that 
|p| + |q| = 2, 3, etc.). This gives some idea of how to show the algebraic numbers 
are countable: 


Exercise 11.16.1. Any given algebraic number a has a minimum polynomial ar ajax" € Za] 
with d>1. Assign a weight w(a) := dyne4 jas|. 

(a) Prove that there are finitely many a of any given weight. 

(b) Deduce that the algebraic numbers are countable. 


Exercise 11.16.2. Reprove the last result using the bijection Z[x] «+ Q defined by 


ao faye +--- tage? 24034 pada 
where p; = 2 < po <--- is the sequence of primes. Can you construct a bijection Z[z] + Z? 


We now show that the real numbers are uncountable (that is, not countable), 
and hence there are more of them than just the algebraic numbers; in other words 


439 


440 Appendix 11D. Transcendental numbers 


there are many transcendental real numbers. So, let us assume that the real num- 
bers are countable and therefore can all be put in a list as 71,72,..... We now 
construct a real number  € (0,1), that is not on the list, by selecting digits for 6 
written in decimal, one at a time. The trick is that the jth digit of 6 should be 
different from the jth digit of rj £9 But then 6 #r; for any 7 (since they differ in 
the jth digit), which contradicts the assumption that the r;’s were a list of all the 
real numbers (as the list does not include 3). Therefore the real numbers are not 
countable, and so uncountable. 


Philosophically this is quite an extraordinary result as it tells us that there 
are (at least) two very different types of infinity, the countable infinity like Z, and 
the uncountable infinity like R. Are there other infinities? This is an important 
question but we start to tread too far away from number theory. 


11.17. The hunt for transcendental numbers 


Proving that any specific number is transcendental is tough. The two numbers that 
we most wanted to understand were, for a long time, 7 and e. Even proving that 
they are irrational is not so easy: 


Exercise 11.17.1. Suppose that e is rational, say e = p/q, and let r = q!(14+ 1/1!+---+1/q!). 
(a) Show that r is an integer and therefore q! - : —r is an integer. 
(b) Prove that gle — r € (0,1). 


(c) Use these remarks to establish a contradiction. 


Exercise aes Given f(x) € C[z], define F(x) := f(x) — f(x) + f(a) — fO(x) +--- 
k _a@f 
where f(*)(x) = ak: 
(a) Prove that A (F' (2) sinz — F(x) cosx) = f(x)sinz. 
(b) Deduce that fj f(x) sina dx = F() + F(0). 

(c) Show that if f(z) = f(m—<«), then F(7) = F(0). 

(d) Now, suppose that 7 is rational, say 7 = p/q, and let f(x) = x2"(p — qx)"/n! for some 
integer n > 1. By establishing that f(*) (0) is an integer for all k > 0, prove that F(0) and 
F(z) are integers. 

(c) Show that if 0 <a <7, then 0 < f(a) < (4q)"/nl. 

2 
Assume that n > 4) - eq. 

(f) In exercise[4.14.2]we proved that n! > 2n”/e™. Deduce that if0 <a <a, then0 < f(x) < 5. 

(g) Prove that {J |sinz|da = 2, and deduce that 0 < {> f(a)sina dx <1. 

(h) Combine the above to deduce that 7 is irrational. 


In 1873 Hermite succeeded in proving that e is transcendental, and then, nine 
years later, Lindemann succeeded in proving that 7 is transcendental. (The known 
proofs are probably a little too difficult for this book, but look at the discussion 
in section 6.6 of [Bak84].) Lindemann’s Theorem implied that it is impossible to 
“square the circle” using a compass and ruler (see section [0.18] of appendix OG). 
Lindemann actually proved the extraordinary result that for any distinct algebraic 
numbers Qj,...,Q@, there do not exist non-zero algebraic numbers ),..., 8, for 
which 


Bye™ + Bye? +--+ + Bre” =0. 


10 Any suitable protocol will do. For example, we can let the jth digit of 6 be 1, unless the jth 
digit of r; is 1, in which case we let the jth digit of 8 be 7. 


11.18. Normal numbers 441 


One deduces that 7 is transcendental, since e’” + e° = 0 and so im cannot be alge- 
braic. One also can deduce that e is transcendental and that log a is transcendental 
for any algebraic a # 0,1. 


. 2 7 : : 
Exercise 11.17.3. Let a := Je, We wish to show that there exist irrational numbers 2, y 
such that x¥ is rational. Use either a or a¥? to prove this. 


Gelfond and Schnieder showed, in 1934, that if a is algebraic and 4 0,1, and 
GB is algebraic and irrational, then a® is transcendental. In particular this implies 


that both Je? and e” are transcendental, which had been famous questions of 
Hilbert. 


Finally, in 1966 Baker established that any non-vanishing linear form 
By logay +--+ + By logan 


is transcendental, where the a;’s and (;’s denote non-zero algebraic numbers, with 
a; #1. Examples include log(p/q) for any rational number p/q # 0, 1. 


There are many numbers that have resisted attack, in that we do not know 
whether they are irrational, for example y (the Euler-Mascheroni constant from 
section [4.14] of appendix 4D) and ¢(2m + 1) for all m > 2[4] (We do know that 
C(2m) = (—1)™*1 (27)? Bo, /2(2m)! for each m > 2, where B,, is the nth Bernoulli 
number. This is a rational multiple of 7?” and therefore transcendental.) Moreover, 
even though we know that e and 7 are both transcendental, we know nothing about 
e+ 7; for all we know it could be rational, though that seems very unlikely. 


We expect more or less any class of interesting numbers to be transcendental 
and for their sums to be transcendental. Of great interest is the set of “non-trivial” 
zeros of the Riemann zeta function (see sections and of appendix 5D). 
These can be written as $ M1, 4 172, $ i73,-.. with 0 < 7, < yq <..., assuming 
the Riemann Hypothesis. We believe that each of the y,;’s is transcendental and 
that they are linearly independent over the rationals (the Linear Independence 
Hypothesis). However no part of these beliefs has been proved, and there is very 
little evidence either way for the algebraic nature of the 7;’s. 


Exercise 11.17.4. Prove that if @ is transcendental, then ag + a1 +---+ ay" is also tran- 
scendental whenever the a; € Q (the set of algebraic numbers, which is the algebraic closure of 
Q) with n > 1 and an #0. 


11.18. Normal numbers 


What do the digits of 7 look like? We know 7z is a transcendental number and believe 
that it has no particular algebraic properties, so we might guess that there is nothing 
special about its digits. For example, we might ask whether there are more 3’s than 
8’s when we write out the digits of 7, or whether the five consecutive digits pattern 
31415 occurs more often than any another pattern. It seems not, and it seems 
unlikely that the digits of a will favor any particular digit pattern, but we really 


11There have been one or two successes on this question: In 1981 Apéry showed that ¢(3) is 
irrational, though we do not know whether it is transcendental. In 2001, Ball and Rivoal showed 
that infinitely many of the ¢(2m + 1) are irrational, and in 2004 Zudilin showed that at least one of 
¢(5), €(7), ¢(9), ¢(11) is irrational. 


442 Appendix 11D. Transcendental numbers 


do not know. Moreover, we asked these questions in the base 10 representation of 
a. What about the base 2 representation? Or the base 37 representation] 


We define n to be a normal number in base b as follows: Write the fractional 
part of n as {n} =n, /b+n2/b? +--+ in base b, where the n,;’s are the base b digits 
(and so are integers between 0 and b— 1). For each integer d > 1 and for every 
d-digit integer in base b (that is, the integers r in the range 0 < r < b¢ — 1), we 
should have that 


. 1 _ = 1 
Jim. wim < Men, bt Nenad" 2 eet nmi gi Bb = rh i= pa" 


That is, for every d > 1, every base 6 digit pattern of length d eventually occurs 
roughly equally often. A number is absolutely normal if it is normal in every base 
b> 1. It is not difficult to show that “almost all” real numbers are normal in base 
b; that is, if one picks a real number at random, then it is normal in base b. 


There are some fun examples of normal numbers, like Champernowne’s number 
0.1234567891011121314151617..., 


obtained by concatenating the decimal expansions of the positive integers, which is 
normal in base 10; and similarly 


0.235711131719232931374143 ..., 


obtained by concatenating the prime numbers, is normal in base 10, and the same 
construction works in every base b > 1. 


The definition of a normal number is complicated and can be reworked as 
follows: The condition n,»,b47! + nm4ib4-? + +++ + Mm+a_1b° = r will now be re- 
phrased, by dividing through by b%, as mm/b+Mm41/b? +--+ +Mm4a-1/b4 = r/b%, 
which is the same as 


m—1 = lm Mm+1 ; Nm+d—1 ; Nm+d r r+ 1 
(11.18.1) {nb } = i + b2 freee pd T pati inked E pa ) 
Therefore n is a normal number in base 6 if and only if the sequence {nb™~!}7>1 
is uniformly distributed mod 1. By Weyl’s criterion (section [LL.7Jof appendix 11A) 


this holds if and only if for every integer k > 1 we have 


1 ae 

Jim, Mu S- e(nb") exists and equals 0 
1<k<M 

(where e(t) := e?’"* as usual). 


Exercise 11.18.1. Complete the details of the proof of the equivalence of normality in base b, 
and the sequence {nb >t being uniformly distributed mod 1. 


Normal digits of numbers 
[1] Greg Martin, Absolutely abnormal numbers, Amer. Math. Monthly 108 (2001), 746-754. 


121 the novel Contact by Carl Sagan, a long string of 0’s and 1’s is found far out in the base 11 
expansion of 7. This is then used to create a picture, which suggests (in the context of the novel) that 
7m was created at the convenience of some supernatural being. It’s a fun read. 


TTT ggg 
Chapter 12 


Binary quadratic forms 


Let a, b, and c be given integers. We saw in Corollary [3.1] that the integers that 
can be represented by the binary linear form az + by are those integers divisible by 
gcd(a, b). We are now interested in what integers can be represented by the binary 
quadratic form[}] 

f(x,y) = ax? + bry + cy’. 
As in the linear case, we can immediately reduce our considerations to the case 
that gcd(a, b,c) = 1. 

The first important result of this type was given by Fermat for the particular 
example f(x,y) = x? + y”, as discussed in section [9.1] The two main results were 
that an odd prime p can be represented by f(x,y) if and only if p = 1 (mod 4), 
and that the product of two integers that can be written as the sum of two squares 
can also be written as the sum of two squares, a consequence of the identity (9.1.1). 
One can combine these two facts to classify exactly which integers are represented 
by the binary quadratic form x? + y?. 


At first sight it looks like it might be difficult to work with the example f(x,y) = 
x?+20ry+101y?. However, this can be rewritten as (x+10y)?+y? and so represents 
exactly the same integers as g(x,y) = x? + y. In other words 


‘ ‘ r 1 10 u 
n= f(u,v) if and only if n = g(r,s), where (") = & ’) @ . 


This 2-by-2 matrix is invertible over the integers, so we can express u and vu 
as integer linear combinations of r and s. Thus every representation of n by f 
corresponds to one by g, and vice versa, a 1-to-1 correspondence, obtained using 
the invertible linear transformation u,v > u+10v,v. Such a pair of quadratic 
forms, f and g, are said to be equivalent; and we have just seen how equivalent 
binary quadratic forms represent exactly the same integers. The discriminant of 


14Binary” as in the two variables x and y, and “quadratic” as in degree two. The monomials 
ax”, bry, cy” each have degree two, since the degree of a term is given by the degree in x plus the degree 
in y. 


443 


444 12. Binary quadratic forms 


ax? +bry+cy? is b?—4ac. We will show that equivalent binary quadratic forms have 
the same discriminant, so that it is an invariant of the equivalence class of binary 
quadratic forms. All of this will be discussed in this chapter and, in appendix 12A, 
we will study generalizations of the identity (9.1.1). 


12.1. Representation of integers by binary quadratic forms 


An integer N is represented by f if there exist integers m,n for which N = f(m,n), 
and N is properly represented if (m,n) = 1 (see exercise[3.9.13]for the same question 
for linear forms). 


Exercise 12.1.1. Prove that if N is squarefree, then all representations of N are proper. 


What integers can be properly represented by ax? + bry + cy?? That is, for 
what integers N do there exist coprime integers m,n such that 


(12.1.1) N =am? + bmn + cn?? 


We may reduce to the case that gcd(a, b,c) = 1 by dividing though by gcd(a, b,c). 
(If gced(a, b,c) = 1, then f is a primitive binary quadratic form.) One idea is to 
complete the square to obtain 


(12.1.2) 4aN = (2am + bn)? — dn? 


where the discriminant d := b? — 4dac. This implies that the discriminant always 
satisfies 
d=Oorl (mod 4). 


There is always at least one binary quadratic form of discriminant d, for such 
d, which we call the principal form: 


x? — (d/4)y? when d=0 (mod 4), 
a +ay+%y? whend=1 (mod 4). 

We call d a fundamental discriminant if d = D = 1 (mod 4), or d = 4D with 

D=2 or 3 (mod 4), and if D = d/(d,4) is squarefree. These are precisely the dis- 

criminants for which every binary quadratic form is primitive (see exercise [12.1.3). 

We met this notion already in exercise[8.16.4] of appendix 8D, when classifying the 

genuinely different Jacobi symbols. 


When d < 0 the right side of (12.1.2) can only take positive values, which makes 
our discussion easier than when d > 0. For this reason we will restrict ourselves to 
the case d < 0 here and revisit the case d > 0 in appendix 12C. If d< 0 anda < 0, 
we replace a,b,c by —a,—b,—c, so as to ensure that am? + bmn + cn? is always 
> 0; in this case, we call ax? + bry + cy? a positive definite binary quadratic form. 


At the start of this chapter we worked through one example of equivalence of 
binary quadratic forms, and here is another: The binary quadratic form x? + y? 
represents the same integers as X?+2NXY 4+ 2Y%, for if N = m?+n?, then N = 
(m—n)?+2(m—n)n+2n?, and similarly if N = u?+2uv+2v?, then N = (ut+v)?+v?. 
The reason is that the substitution 


(7) = (y) were ar= (9 1) 


12.1. Representation of integers by binary quadratic forms 445 


transforms x? + y? into X?7+2XY+2Y?, and the transformation is invertible, since 
det M = 1. We therefore say that 2? + y? and X? + 2XY + 2Y? are equivalent 
which we denote by 

Bo yea K OXY Soy": 


Much more generally define 


su(2,z)={(° A : a, B,7,6 € Z and ad — by=1}. 


We can represent the binary quadratic form as 


av? + bey + cy?=(x y) Gs 12) er 


a 


b/2 


()-m mea Dewan 


2 2 Tv a b/2 xX 
AX? + BXY+CY?=(X Y)M as Pa (F): 


Its discriminant is —4 times the determinant of ( ss a We deduce that if 


then 


so that 


(12.1.3) & wy = MT G ) M, 


which yields the somewhat painful looking explicit formulas 


A = f(a,7) = aa? + bay+t cy”, 
(12.1.4) B = 2aBa+t (ad + By)b + 2y6e, 
C= f(B,5) = a8? + bB5 + 082. 


When working with binary quadratic forms it is convenient to represent ax? + 
bry + cy? by the notation [a,b,c]. We have just proven the following. 


Proposition 12.1.1. If f = [a,b,c] ~ F = [A,B,C], then there exist integers 
a, 8,7,6 with ad — By =1 for which A = f(a,7) and C = f(6,6). Moreover f and 
F represent the same integers, and there is a 1-to-1 correspondence between their 
representations and proper representations of a given integer. 


Exercise 12.1.2. (a) Suppose that d is a fundamental discriminant. Prove that the character 
(d/-) has conductor dividing d. 
(b) Prove that for any non-zero integer d, the character (d/-) has conductor that divides 4d. 


The conductor of f(-) is the minimum p > 0 such that f(n +p) = f(n) for all integers n. 


Exercise 12.1.3. Suppose that d = 0 or 1 (mod 4). Show that every binary quadratic form of 
discriminant d is primitive if and only if d is a fundamental discriminant. 


Exercise 12.1.4. (a) Show that if d < 0, then am? + bmn + cn? has the same sign as a, no 
matter what the choices of integers m and n. 
(b) Show that if ax? + bry + cy? is positive definite, then a,c > 0. 
(c) Show that if d > 0, then am? + bmn + cn? can take both positive and negative values, by 
making explicit choices of integers m,n. 


446 12. Binary quadratic forms 


Exercise 12.1.5. Use (12.1.3) to show that two equivalent binary quadratic forms have the same 
discriminant. 


Exercise 12.1.6. Show that the principal form is equivalent to every binary quadratic form 
x? + bry + cy? with leading coefficient 1, up to equivalence. 


Exercise 12.1.7. In each part, determine whether the two binary quadratic forms are equivalent. 
If so, make the equivalence explicit; if not, explain why not. 

(a) y? +ay + 4a? and x? — 5ay + 10y?. 

(b) a? + 3ay + 5y? and 3a? — day + 11y?. 


12.2. Equivalence classes of binary quadratic forms 


In this section we will develop an algorithm that will allow us to show, for example, 
that 29X? + 82XY +58Y? is equivalent to x? + y?. We do this as it is surely more 
intuitive to work with the latter form rather than the former. Gauss observed that 
every equivalence class of binary quadratic forms (with d < 0) contains a unique 
smallest representative, called the reduced representative, which we now prove: 
The quadratic form ax? + bry + cy? with discriminant d < 0 is reduced if 


—a<b<a<cand b> 0 whenever a=c. 


Theorem 12.1. Every positive definite binary quadratic form is equivalent to a 
reduced form. 


Proof. We will define a sequence of properly equivalent forms; the algorithm ter- 
minates when we reach one that is reduced. Given a form [a,b,c], we use one of 
three transformations, described in terms of matrices from SL(2, Z): 


(i) If c <a, the transformation 


a\ (0 -1\ (X 

y}~\1 o}ly 
yields the form [c,—b,a] which is properly equivalent to [a,b,c] (as ax? + 
bry + cy? = a(—-Y)? + b(-Y)(X) + c(X)? = cX? — DXY + aY7). Hence 
A=c<a=C. 


(ii) If b > a or b < —a, then select B to be the absolutely least residue of 6 
(mod 2a), so that -a < B <a, say B = b—2ka. The transformation matrix 


will be 
w\ (1 —-k\ (XxX 
y) \O 1 Y)° 
The resulting form [A,B,C] with A = a is properly equivalent to [a, b, cl, 
where —-A< B< A. 
(iii) If c=a and —a < b < 0, then the transformation 


r\ (0 —1\ (XxX 
y/) \1 0 Y 
yields the form [A, B, A] with A =a and B = —b, so thatO0< B<A. 


If the resulting form is not reduced, then repeat the algorithm. If none of these 
hypotheses holds, then one can easily verify that the form is reduced. To prove that 
the algorithm terminates in finitely many steps we follow the leading coefficient 


12.3. Congruence restrictions on the values of a binary quadratic form 447 


a: a starts as a positive integer. Each transformation of type (i) reduces the size 
of a. It stays the same after transformations of type (ii) or (iii), but after a type 
(iii) transformation the algorithm terminates, and after a type (ii) transformation 
we either have another type (i) transformation or else the algorithm stops after at 
most one more transformation. Hence the algorithm finishes in no more than 2a+1 
steps. 


Examples. Applying the reduction algorithm to the form [76, 217, 155] of discrim- 
inant —31, one finds the sequence of forms 


(76, 65, 14], [14, —65, 76], [14, —9, 2], [2, 9, 14], [2, 1, 4], 


the sought-after reduced form. Similarly the form [11, 49,55] of discriminant —19 
gives the sequence of forms [11, 5, 1], [1, —5, 11], [1,1, 5]. 


This proof of Theorem [12.1] can be rephrased to prove Theorem [1.2] of section 
[[.10] (of appendix 1A), that every matrix in SL(2,Z) can be represented as the 
product of powers of the matrices S = ({ i) and T = (° 3): The matrices 
0 


used in the transformations in the proof of Theorem |L2.1]are (; 


1 —k = —k 
Gree 


The very precise conditions in the definition of “reduced” were chosen so that 
every positive definite binary quadratic form is properly equivalent to a unique 
reduced form. The key to proving uniqueness is exercise {12.6.1} the (messy) details 
are completed in exercise [12.6.2] 


= ae 
ee and 


12.3. Congruence restrictions on the values of a binary quadratic form 


What restrictions are there on the values that can be taken by a binary quadratic 
form (in analogy to Theorem [9.2)? 


Proposition 12.3.1. Let d = b? — 4ac where (a,b,c) = 1. 


(i) If integer N is properly represented by ax? + bry +cy”, then d is a square mod 
AN. 

(ii) Ifd is a square mod 4N, then there exists a binary quadratic form of discrim- 
inant d that properly represents N. 


Proof. (ii) If d = 6? (mod 4N), then d = b? — 4Nc for some integer c, and so 
Na? + bry + cy? is a quadratic form of discriminant d which represents N = 
N-17+b-1-0+c-0?. 


(i) Suppose that N = am? + bmn +4 cn? with (m,n) = 1. Then (2am + bn)? — 
dn? = 4aN so that dn? = (2am+ bn)? (mod 4N); that is, dn? is a square mod 4N 
and, analogously, dm? is a square mod 4N. Now if p is a prime such that p*||4N, 
then p does not divide at least one of m and n, as (m,n) = 1. We deduce that d is 
a square mod p* from the fact that dn? is a square mod p* if p does not divide n, 
and from the fact that dm? is a square mod p* if p does not divide m. The result, 
that dis a square mod 4N now follows from the Chinese Remainder Theorem. 


448 12. Binary quadratic forms 


For a given odd prime p, Proposition[2.3.1]tells us that p is represented by some 
binary quadratic form of discriminant d if and only if (d/p) = 1 or 0. However it does 
not tell us which binary quadratic form. In section [9.6] we could not immediately 
determine which of the two reduced binary quadratic forms of discriminant —20, 
namely x? + 5y? and 2x? + 2xry + 3y”, represents which primes p with (—20/p) = 1. 
There we found we could distinguish which prime was represented by which form 
by also studying the values of (p/d). We now see how this works out in general. 


We can appeal to Corollary to restrict the possibilities for the binary 
quadratic forms of discriminant d that represent N. Given a primitive binary 
quadratic form f of discriminant d we define, for each odd prime p dividing d, 


oF(p) = (s) if p does not divide a, and of(p) = (<) if p does divide a. 


If p divides a, then p divides d+ 4ac = b? and so divides b, and therefore cannot 
divide c as f is primitive. Therefore of(p) equals 1 or —1 for each such p. 


Exercise 12.3.1.1 Prove that if f ~ g, then o+(p) = o4(p) for all odd primes p dividing d. 


Corollary 12.3.1. Suppose that d is a fundamental discriminant and that N is a 
squarefree integer for which (N,d) =1. If d is a square mod 4N, then there exists 
a binary quadratic form f of discriminant d that properly represents N such that 


os(p) = (2) for every odd prime p dividing d. 


Proof. There exists a binary quadratic form f of discriminant d that properly 
represents N, by Proposition [[2.3.1{ii). Therefore N is represented by inserting 


rationals into f and this happens, by Corollary [9.4.1] if and only if (2) =o;(p) 
for every odd prime p dividing d. 


When d = —20 we have of (5) = 1 for f = x? + 5y? and of(5) = (2/5) = -1 
for f = 2x? + 2xry + 3y?. This can certainly settle such issues in several cases. 

There are three reduced quadratic forms [1, 1, 6], [2, £1, 3] with d = —23. How- 
ever (23) = 1 for each of these, so this does not help us to distinguish between the 
integers represented by these quadratic forms. This case is much more complicated 
and beyond the scope of this book. 


We develop these ideas further in section [12.11] of appendix 12B. 


Exercise 12.3.2. Prove that if p1,...,p, are distinct primes that are each represented by some 
form of discriminant d, then pi --- px is also represented by some form of discriminant d. 
12.4. Class numbers 


Theorem 12.2. Ifd < 0, then there are only finitely many reduced binary quadratic 
forms of discriminant d. 


Proof. For a reduced binary quadratic form, |d| = 4ac — (|b|)? > 4a-a— a? = 3a? 
and so a is a positive integer for which 


as y|d|/3. 


12.5. Class number one 449 


Therefore for a given d < 0 there are only finitely many a, and so 6 (as |b] < a), 
but then c = (b? — d)/4a is determined, and so there are only finitely many reduced 
binary quadratic forms of discriminant d. 


Let h(d) denote the class number, the number of equivalence classes of binary 
quadratic forms of discriminant d. We have just shown h(d) is finite, and the proof 
of Theorem [12.2] even describes an algorithm to easily find all the reduced binary 
quadratic forms of a given discriminant d < 0. In fact h(d) > 1 since we always 
have the principal form. If h(d) = 1, then all binary quadratic forms are equivalent 
to the principal form. 


Example. If d = —163, then |b| < a < ./163/3 < 8. But b is odd, since b = b? = 
d+4ac = d (mod 2), so |b| = 1,3, 5, or 7. Therefore ac = (b? + 163)/4 = 41, 43, 47, 
or 53, a prime, with 0 < a < cand hence a = 1. Since b is odd and —a < b <a, we 
deduce that b = 1 and so c = 41. Hence x? + xy + 41y? is the only reduced binary 
quadratic form of discriminant —163, and therefore h(—163) = 1. 


Exercise 12.4.1. Determine all of the reduced binary quadratic forms of discriminant d for 
—20 < d< —1 as well as for d = —28, —43, —67, —167, and —171. 


Exercise 12.4.2. Determine all of the reduced binary quadratic forms of discriminant d for 
d = —3, —15, —23, —39, —47, —87, —71, and —95. 


Exercise 12.4.3. Determine all of the reduced binary quadratic forms of discriminant d for 
d= —4, —20, —56, and —104. 


Exercise 12.4.4. Prove that if ax? + bry + cy? is a reduced binary quadratic of discriminant 
d <0, then |c| > \/|d|/2. 


12.5. Class number one 


Corollary 12.5.1. Suppose that h(d) =1. Then N is properly represented by the 
form of discriminant d if and only if d is a square mod 4N. 


Proof. This follows immediately from Proposition [[2.3-1] since there is just one 
equivalence class of quadratic forms of discriminant d, and forms in the same equiv- 
alence class represent the same integers by Proposition [2.1.1] 


We have h(—4) = 1 and so Corollary [[2.5.J]implies that integer N is properly 
represented by x? + y? if and only if —4 is a square mod 4N. This is more or less 
Theorem [9.2] (and can be deduced from its proof). 

In the example in section we showed that 2? + ry + 41y? is the only 
binary quadratic form of discriminant —163. This implies, by Corollary [12.5.1 
that if prime p 4 2 or 163, then it can be represented by the binary quadratic form 
x? + ay +4ly? if and only if (—163/p) = 1. 

In exercise [12.4.1] we exhibited nine fundamental discriminants d < 0 with 
h(d) = 1, namely d = —3, —4,—7, —8,—11, —19, —43, —67, as well as —163. It 


450 12. Binary quadratic forms 


turns out these are the only ones with class number oneF| Therefore, as in the 
example above, if pf 2d, then 
p is represented by x? + y? if and only if (—1/p) = 1; 
p is represented by x? + 2y? if and only if (—2/p) = 1; 
p is represented by x? + xy + y? if and only if (—3/p) = 1; 
p is represented by x? + xy + 2y? if and only if (—7/p) = 1; 
p is represented by 2? + xy + 3y? if and only if (—11/p) = 1; 
p is represented by x? + ry + 5y? if and only if (—19/p) = 1; 
p is represented by x? + xy + lly? if and only if (—43/p) = 1; 
p is represented by x? + xy + 17y? if and only if (—67/p) = 1; 
p is represented by x? + xy + 41y? if and only if (—163/p) = 1. 


Euler noticed that the polynomial x? + x + 41 is prime for x = 0,1,2,...,39, and 
similarly the other polynomials above. Rabinowiscz proved that this is an “if and 
only if” condition: 


Theorem 12.3 (Rabinowiscz’s criterion). We have h(1—4A) =1 for A > 2 if and 
only if x? +2+ A is prime forn =0,1,2,...,A—2. 


At n = A—1 the polynomial takes value (A — 1)? + (A —1)+A = A? which 
is composite. We will prove Rabinowiscz’s criterion below. 


What about when the class number is not one? In the example with d = —20 
we have h(—20) = 2; the two reduced forms are x? + 5y? and 2x? + 2xy + 3y?. By 
Proposition [12.3.1] p is represented by at least one of these two forms if and only 
if (—5/p) = 0 or 1, that is, if p = 1,3,7, or 9 (mod 20) or p = 2 or 5. Can we 
decide which of these primes are represented by which of the two forms? Note that 
if p = x? + 5y”, then (p/5) = 0 or 1 and so p= 5 or p = +1 (mod 5), and thus 
p = 1 or 9 (mod 20). If p = 22? + 2ay + 3y?, then 2p = (2x + y)? + 5y” and so 
p = 2 or (2p/5) = 1; that is, (p/5) = —1, and hence p = 3 or 7 (mod 20). Hence 
we have proved 


p is represented by x? + 5y? if and only if p= 5, or p= 1 or 9 (mod 20); 
p is represented by 27? + 27y+ 3y? if and only if p = 2, or p= 3 or 7 (mod 20). 


That is, we can distinguish which primes can be represented by which binary qua- 
dratic form of discriminant —20, through congruence conditions, despite the fact 
that the class number is not one. However we cannot always do this; that is, we 
cannot always distinguish which primes are represented by which binary quadratic 
form of discriminant d. It is understood how to recognize those discriminants d for 
which we can determine which binary quadratic forms of discriminant d represent 


?The proof that the above list gives all of the d < 0, for which h(d) = 1, has an interesting history. 
By 1934 it was known that there is no more than one further such d, but that putative d could not be 
ruled out by the method. In 1952, Kurt Heegner, a German school teacher proposed an extraordinary 
proof that there are no further d. At the time his paper was ignored since it was based on a result 
from an old book (of Weber) whose proof was known to be incomplete. In 1966 Alan Baker gave a 
very different (and more obviously correct) proof that this was the complete list of discriminants with 
class number one, and this was widely acknowledged to be correct. However, soon afterwards Stark 
realized that the proofs in Weber are easily corrected, so that Heegner’s work had been fundamentally 
correct. Heegner was subsequently given credit for solving this famous problem, but sadly only after he 
had died. Heegner’s paper contains a most extraordinary construction, widely regarded to be one of the 
most creative and influential in modern number theory. 


12.5. Class number one 451 


which integers simply through congruence conditions (see section [12.11] of appendix 
12B). These idoneal numbers were recognized by Euler. He found 65 of them, and 
no more are known—it is an open conjecture as to whether Euler’s list is complete. 
It is known that there can be at most one further undiscovered idoneal number, but 
it seems unlikely whether the techniques used can rule out this putative example[}] 
Exercise 12.5.1. (a) Determine the two reduced binary quadratic forms of discriminant —15. 

(b) Determine which reduced residue classes can be represented by some form of discriminant 

—15? 
(c) Distinguish which primes are represented by which form (with proof). 


Proof of Rabinowiscz’s criterion. We begin by showing that f(n) := n?+n+A 
is composite for some integer n in the range 0 < n < A—2, if and only ifd=1-—4A 
is a square mod 4p for some prime p < A. For if n? + n+ A is composite, let 
p be its smallest prime factor so that p < f(n)!/? < f(A—1)!/? = A. Then 
(2n + 1)? —d = 4(n? +n+ A) =0 (mod 4p) so that d is a square mod 4p. On 
the other hand if d is a square mod 4p where p is a prime < A — 1, select m to be 
the smallest positive integer such that d = m? (mod 4p). Then m < 2p (or else 
replace m by 4p — m) and m is odd (as d is odd), so write m = 2n + 1 and then 
0<n<p—1< A—2 with d= (2n+1)? mod 4p. Therefore p divides n? +n +A 
with p< A= f(0) < f(n) so that n? +n+ A is composite. 

Now we show that h(d) > 1 if and only if d = 1 — 4A is a square mod 4p 
for some prime p < A. If h(d) > 1, then there exists a reduced binary quadratic 
ax? +bry+cy? of discriminant d with 1 < a < \/|d|/3 < A by the proof of Theorem 
If p is a prime factor of a, then p < a < A and d = b? — 4ac is a square mod 
4p. On the other hand if d is a square mod 4p for some prime p < A, and h(d) = 1, 
then p is represented by x? +ay + Ay? by Proposition [[2.3-1lii). Now y 4 0 as p is 
not a square. Therefore 4p = (2x + y)? + |dly? > 0? + |d| - 1? = |d|; that is, p > A, 
a contradiction. (We will extend this proof to obtain more on the small values 
taken by any binary quadratic form of negative discriminant, in exercise[12.6.1(a).) 
Hence h(d) > 1. 


Putting these two results together, we deduce that h(d) > 1 if and only if 
f(n) := n? +n+ A is composite for some integer n in the range 0 < n < A-2, 
which implies Rabinowiscz’s criterion. 


Exercise 12.5.2.1 Prove that if n? + n+ A is prime for all integers n in the range 0 <n < B, 
where 1 < B < (A—1)/2, then (5%) = —1 for all primes p < 2B +1. 


The class number one problem for even negative fundamental discriminants is 
not difficult: 


Theorem 12.4. If h(d) =1 with d= —4n forn EN, then n = 1,2,3,4, or 7. 


Proof. Suppose that h(—4n) = 1. Then n must be a prime power or else there 
exist coprime integers 1 < a < c for which ac = n and so |[a,0, c] is a non-principal 
reduced form of discriminant —4n. Moreover n + 1 must be an odd prime or a 
power of 2 or else there exist integers 1 < a < c with gced(a,2,c) = 1 for which 
ac =n+1 and so [a, 2,c] is a non-principal reduced form of discriminant —4n. 


3We therefore find ourselves in much the same situation as for class number one before Heegner’s 
work, as discussed in the last footnote. 


452 12. Binary quadratic forms 


One of n and n+ 1 is even and hence must be a power of 2 (from the previous 
paragraph). If n = 2* with k > 4, then we have the non-principal reduced form 
(4,4, 2*-?2 +1], and ifn+1 = 2" with k > 6, then we have the non-principal reduced 
form [8,6,2*~? + 1]. 

Therefore if h(—4n) = 1, then n = 1,2,4, or 8 or n+ 1 = 2,4,8,16, or 32. We 
can rule out n = 15 (as 15 is composite) and n = 8 (as 9 is not an odd prime) and 
n = 31 (as [5,4, 7] is a non-principal reduced form of discriminant —124). We know 
that h(—4n) = 1 for n = 1,2,3,4, and 7 by exercise [12.4.1 


These discriminants have a beautiful property. 


Corollary 12.5.2. Let n =1,2,3,4, or 7. If p is a prime that does not divide 4n, 


then p can be written as u? + nv? if and only if (=2) =, 


Proof. As we just discussed h(—4n) = 1, and so all binary quadratic forms of 
discriminant —4n are equivalent to x? + ny?. By Proposition [12.3.1] p can be 
represented by some form of discriminant —4n if and only if —4n is a square mod 
p, and the result follows. 


We had already discussed representations of p by x? + y?, 2? + 2y?, x? + 3y? 
in sections [9.1] and [9.2] and x? + 4y? = x? + (2y)? follows easily from x? + y?. This 
leaves only the most interesting of the cases of Corollary [12.5.2 


p=x27 +7y? if and only if p = 1,9, 11, 15, 23, or 25 (mod 28). 


Exercise 12.5.3. Let q be a prime = —1 (mod 4). Prove that (2) = —1 for all primes p < att 


if and only if h(—q) = 1. This result suggests that finding a small prime p with (2) = 1 can be 


a deep problem (see appendix 8B for a discussion of small quadratic residues). 


For much more on the values taken by binary quadratic forms, particularly the 
prime values, we recommend David Cox’s wonderful book [I]. 


References for this chapter 


[1] David A. Cox, Primes of the form «x? + ny”, Wiley, 1989. 


[2] Dorian Goldfeld, Gauss’s class number problem for imaginary quadratic fields, Bull. Amer. Math. 
Soc. 13 (1985), 23-37. 


Additional exercises 


These last questions get considerably more involved but may be of interest to stu- 
dents interested in further pursuing number theory. 


Exercise 12.6.1. Suppose that f(a, y) = ax? + bry + cy? is a reduced binary quadratic form. 
(a) Show that if am? + bmn + en? < a— |b] +c with (m,n) = 1, then |ml, |n| < 1. 
(b) Prove that the least values properly represented by f are a < c < a— |b| +c, the first 
two properly represented twice, the last twice unless b = 0, in which case it is properly 
represented four times. 


Exercise 12.6.2. We now use the results of exercise [12.6.1}to understand equivalences between 
primitive reduced binary quadratic forms. The idea is to recognize a reduced binary quadratic 
form by the smallest values it properly represents. 


Questions on binary quadratic forms 453 


(a) 


Prove that: 
e If0 < |b] <a<c, then [a,b,c] properly represents a, c, and a — |b| +c in exactly 2, 2, 
and 2 different ways, respectively. 
e If0 < |b] =a <c, then [a,b,c] properly represents a, and c = a — |b| + c in exactly 2, 
and 4 different ways, respectively. 
e If0 < |b] <a=c, then [a,b,c] properly represents a = c, and a — |b| +c in exactly 4, 
and 2 different ways, respectively. 
e If 0 = |b| <a<c, then [a,b,c] properly represents a, c, and a — |b| +c in exactly 2, 2, 
and 4 different ways, respectively. 
e [1,1,1] properly represents 1 in exactly six different ways. 
e [1,0,1] properly represents both 1 and 2 in exactly four different ways. 
Deduce that if [a, b, c], and [A, B, C] are equivalent primitive reduced binary quadratic forms, 
then A=a, C=c, and B=bor —b. 
Use exercise[I2.6.1{a) to show that the entries of a matrix representing such an equivalence 
must each be —1, 0, or 1. 
Prove that distinct primitive reduced binary quadratic forms are all inequivalent. Together 
with Theorem[/2.1|this implies that every positive definite binary quadratic form is properly 
equivalent to a unique reduced form. 
Suppose that M € SL(2,Z) transforms a primitive reduced binary quadratic form to itself 


(this is an automorphism). Show that M = +1, except in the following two cases: 
e [1,1,1] has automorphisms given by +/, 6 ) and + ‘© a 


e [1,0,1] has automorphisms given by +/ and ( 7 a 


Exercise 12.6.3. (a) Show that if [A, B,C] ~ [a,b,c], then [A, —B, C] ~ [a, —6, c]. 


(b) 


(f) 


(g) 
(h) 


Use exercise[12.6.2[d) to show that if [a,b,c] is reduced, then [a, b,c] ~ [a, —b, ¢] if and only 
ifb=0,b=a, ora=c. 

Deduce that [A, B,C] ~ [A,—B,C] if and only if they are equivalent to a quadratic form 
[a,0,c], [a, a,c], or [a, b, a]. 

Prove that [a, a,c] ~ [c, 2c — a, c]. 

If d < 0 is odd, then show that the primitive reduced forms are given by taking each 
factorization —d = rs with 0 <r < sand (r,s) =1, 


[a,a,c] if s > 3r wherea=r andc=(r+s)/4, 
[a,b,a] if s < 3r where a= (r+ s)/4 and b = (s—r)/2. 


If d < 0 is even, then show that the primitive reduced forms are given by taking each 
factorization —d/4 = rs with 0 <r < sand (r,s) =1, 


[a,0,c] witha=randc=s, 
[a,a,c] if s > 3r where a = 2r and c= (r+s)/2, 
[a,b,a] if s < 3r where a= (r+s)/2 andb=s—r. 


Note that the last two cases hold only if d/4 is odd. 

Show that each binary quadratic form either represents both r and s, or both 2r and 2s. 
(In (d), take f(1,-—2) = s in the first case; f(1,1) = s, f(1,—1) =, in the second case.) 
Deduce that if d < 0 is a fundamental discriminant, then there are exactly 2'~! reduced 
binary quadratic forms for which [a,b,c] ~ [a,—b,c], where t is the number of odd prime 
divisors of |d|, unless 4||d in which case there are 2°. 


Exercise 12.6.4. (a) Prove that x? + 6y? and 22? + 3y? are the only binary quadratic forms, 


(b) 
(c) 


up to equivalence, of discriminant —24. 

Prove that prime p can be written in the form a? + 6b? if and only if p= 1 or 7 (mod 24). 
Prove that prime p can be written in the form 2u? + 3v? if and only if p = 2 or 3, or p=5 
or 11 (mod 24). 


454 12. Binary quadratic forms 


We can refine this further: 
(d) Prove that prime p can be written in the form a? + 24B? if and only if p=1 (mod 24). 
(e) Prove that prime p can be written in the form 8U? + 3v? if and only if p = 3, or p= 11 
(mod 24). 


Automorphisms of binary quadratic forms. 


Exercise 12.6.5. Suppose that f ~ g via the transformation M and that G is the group of 
automorphisms of f. 
(a) Prove that M~'GM is the group of automorphisms of g. 
(b) Prove that MG is the set of transformations yielding g from f. 
(c) Deduce that there are w(d) automorphisms of every primitive quadratic form of discriminant 
d, where w(—3) = 6,w(—4) = 4, and w(d) = 2 for all other discriminants d < 0. 


Exercise 12.6.6. (a) If N = f(a,b), then N = f(—a,—b). If N = a? + b?, then N = 6? + 
(—a)? = (—a)?+(—b)? = (—b)?+a?. If N = a? +ab+6?, then find five other representations 
of N by the quadratic form x? + ry + y?. 
(b) Explain how these representations correspond to the automorphisms of the quadratic form. 
(c) Why did we not include N = (—a)? + b? in the representations in part (a)? 


Exercise 12.6.7. (a) Let a,6,7,6 be given integers for which ad — By = 1. Prove that ’, 6’ 
are integers for which ad’ — 8’y = 1 if and only if there exists an integer k such that 


( &)=G iG 4). 


(b) If A = f(a,y) with (a,y) = 1, then prove that there exists a unique pair of integers 8,6 
such that f ~ [A,B,C] using the matrix M = C i) € SL(2,Z) for some integer B in 
the range -A< B<A. 

(c) Deduce that the proper representations of the integer A by reduced binary quadratic forms 
of discriminant d are in w(d)-to-1 correspondence with the solutions to B? = d (mod 4A) 
with -A<B<A. 


Exercise 12.6.8. Let f1,...,f, be the h = h(d) distinct reduced binary quadratic forms of 
discriminant d, where d = 0 or 1 (mod 4). Let r;(A) denote the number of proper representations 
of A by f;. Prove that 


BAe aS 5w(d)- utB’ (mod 442 BPS a Cned 2A)} 
: d 
and that this equals w(d)- TT, 4 (1 + (4)) unless perhaps 4|(A, d). 


Exercise 12.6.9. Suppose that p is an odd prime for which (d/p) = 1. Prove that p is properly 
represented either by only the principal form of discriminant d, or by only two non-principal, 


reduced, binary quadratic forms of discriminant d, one, say, ax? + bry+cy?, the other ax? — bry + 


cy?. 


Transformations of the upper half-plane. Let H := {z € C : Im(z) > 0} be 
the upper half-plane. We consider transformations with M = SL(2,Z) acting on 
z €C by taking MW (3) = (*) and considering this to be the map z > u/v. In 


Theorem [2] we saw that that every matrix in SL(2,Z) can be represented as a 


product of the two fundamental matrices S' = ( and T = (2, a 


Exercise 12.6.10. Prove that S represents the transformation z > z+ 1 and that T represents 
the transformation z > —1/z. 


Questions on transformations of the upper half-plane 455 


We define 
1 1 1 
Fai €C: |z| > land — 5 < Re(z) < shufzec : |z| = land — 5 < Re(z) < of. 


Figure 12.1. The shaded region is the fundamental domain F CH. 


Exercise 12.6.11.' Prove that the binary quadratic form ax? + bry + cy? with discriminant 
d <0 is reduced if and only if a EF. 


Exercise 12.6.12.1 Prove that for every z € C there exists M € SL(2,Z) such that Mz € F. 
Prove that M is unique. 


Exercise 12.6.13.4 Show that {MF: M € SL(2,Z)} is a partition of H into disjoint sets. 


The shaded region is F. Each enclosed region is a 
domain MF for some M € SL(2,Z). 


Appendix 12A. Composition 
rules: Gauss, Dirichlet, 
and Bhargava 


We study generalizations of the identity (9.1.1), which leads to a notion of “multi- 
plying” binary quadratic forms together, and hence to the group structure discov- 
ered by Gauss. We go on to study the reformulations of Dirichlet and Bhargava. 


12.7. Composition and Gauss 


In (9.1.1) we see that the product of any two integers represented by the binary 
quadratic form x? +y? is also an integer represented by that binary quadratic form. 
We now look for further such identities. One easy generalization is given by 


(12.7.1) (u? + Dv?)(r? + Ds?) = 2? + Dy® where x = ur + Dus and y = us — vr. 


Therefore the product of any two integers represented by the binary quadratic form 
x? + Dy? is also an integer represented by that binary quadratic form. For general 
diagonal binary quadratic forms (that is, having no “cross term” bay) we have 


(12.7.2) (au? + cv?) (ar? + cs”) = x? + acy” where x = aur+cvs and y = us— vr. 


Notice here that the quadratic form on the right-hand side is different from those on 
the left; that is, the product of any two integers represented by the binary quadratic 
form ax? + cy” is an integer represented by the binary quadratic form x? + acy?. 


One can come up with a similar identity no matter what the quadratic form, 
though one proceeds slightly differently depending on whether the coefficient b is 
odd or even. The discriminant d = b? — 4ac has the same parity as b. If d is even, 


456 


12.7. Composition and Gauss 457 


then 
(12.7.3) (au? + buv + ev?) (ar* + brs + cs”) = 2? — —y’, 


where x = aur + Sur +us)+cus and y =rvu— su. 
If dis odd, then 


d—1 
(12.7.4) (au? + buv + ev?) (ar? + brs + cs”) = 2? + xy — ae 
where x = aur + our + Pus + cus and y=rvu— su. 


That is, the product of two integers represented by the same binary quadratic form 
can be represented by the principal binary quadratic form of the same discriminant. 


Exercise 12.7.1. (a) Prove that if n is represented by ax? + bry + cy?, then an is represented 
by the principal form of the same discriminant. 
(b) Suppose that d < 0. Deduce that if d is a square mod 4n, then there is a multiple an of n 
which is represented by the principal form of discriminant d, with 1 <a < /|d|/3. 
(c) We obtained the bound 1 <a< Jd] when d is even in section [9.6] Use that method to 
find a bound in the case that d is odd. 


What about the product of the values of two different binary quadratic forms? 
If d is even, we have 


(12.7.5) (au? + buv + ev?) (r? — 48?) = ax? + bry + cy’, 


where x = ur + S su + cvs and y = ur — asu— bus. 


If d is odd, then 
(12.7.6) (au? + buv + cv?)(r? +s — 4s?) = ax? + bry + cy’, 


where & = ur + ott su + cvs and y = ur — asu— phys. 


That is, the product of an integer that can be represented by a binary quadratic 
form f and an integer that can be represented by the principal binary quadratic 
form of the same discriminant can be represented by f. 


Exercise 12.7.2. Suppose that a is a prime and d = b? — 4ac is even. Let D = —d/4. 

(a) Show that if a divides r? + Ds?, then a divides either r + (b/2)s or r — (b/2)s. 

(b) Prove that if r?+ Ds? = an, then there exist integers X,Y for which n = aX?+bXY+cY?. 
If n is prime, then this result is true whether or not a is prime, but we will not prove that here. 
Assume though that is so. 

(c) Suppose that (d/p) = 1 and that ap is the smallest multiple of p that is represented by the 

principal form. Prove that a here must take the same value as in exercise 

(d) Prove that 1 < a < \/|d|/3 and then use exercises [[2-4.4] and [12.6.1{b) to prove that if 


p< /|d|/2, then a= p. 


What about two different binary quadratic forms with no particular structure? 
For example, 


(4u? + 3uv + 5v?)(3r? + rs + 6s”) = 2x? + xy + Oy? 


by taking x = ur — 3us — 2ur — 3us and y = ur+us+vr— vs. These are 
three inequivalent binary quadratic forms of discriminant —71. Gauss called this 
composition, that is, finding, for given binary quadratic forms f and g of the same 


458 Appendix 12A. Composition rules: Gauss, Dirichlet, and Bhargava 


discriminant, a third binary quadratic form h of the same discriminant for which 


f(u,v)g(r, 8) = A(x, y), 
where x and y are quadratic polynomials in u, v, r, and s. 


These constructions suggest many questions. For example, are the identities 
that we found for two given quadratic forms the only possibility? Could the prod- 
uct of two sums of two squares always equal the value of some entirely different 
quadratic form? When we are given two quadratic forms of the same discriminant, 
is it true that there is always some third quadratic form of the same discriminant 
such that the product of the values of the first two always equals a value of the 
third? That is, is there always a composition of two given binary quadratic forms of 
the same discriminant? If so, can we determine the third quadratic form quickly? 


Gauss proved that one can always find the composition of two binary quadratic 
forms of the same discriminant. The formulas above can mislead one into guessing 
that this is simply a question of finding the right generalization, but that is far 
from the truth. All of the examples, through to (12.7.6), are so explicit 
only because they are very special cases in the theory. In Gauss’s proof he had to 
prove that various other equations could be solved in integers in order to find h 
and the quadratic polynomials x and y (which are polynomials in u, v, r, and s). 
This was so complicated that some of the intermediate formulas took two pages to 
write down and are very difficult to make sense of We will prove Gauss’s theorem 
though we will approach it in a somewhat different way. 

Exercise 12.7.3. Given non-zero integers a, b,c,d prove that there exist integers m,n such that 


the set of integers that can be represented by (ar + bs)(cu + dv) as r,s,u,v run over the integers 
is the same as the set of integers that can be represented by mx+ny as x, y run over the integers. 


We finish this section by presenting a fairly general composition. 


Proposition 12.7.1. Suppose that ax? + bjxy + ciy? fori =1,2 are binary qua- 
dratic forms of discriminant d such that q = (a1, 42) divides tbe Then 


(12.7.7) (aya? + brary, + cry?) (a2x3 + boxeye + cays) = a3x3 + b323y3 + c3y3 


where a3 = a1a2/q? and bg is any integer simultaneously satisfying the following 
(solvable) set of congruences: 


b= (mod 4a,a2/q"), 
bg =b, (mod 2a,/q), bg = bz (mod 2a2/q), 
b3 (by + bz) = by be +d (mod 4da,a2/q), 


and cz is chosen so that the discriminant of a3x3 + b3x3y3 + c3y3 is d. 


Exercise 12.7.4. Show that the above congruences for b3 can be solved. 


Proposition [12.7.1] implies that we can always compose two binary quadratic 
forms f and g of the same discriminant, whose leading coefficients are coprime. 


*See article 234 and beyond in Gauss’s book Disquisitiones Arithmeticae (1804). 


12.8. Dirichlet composition 459 


Proof sketch. Computer software verifies that ([2.7.7) holds taking a3 = a,a2/q’, 
for any integer q dividing (a1, a2), with 


a ere. aa ae y ta Pe bile Fd —Oabi— Osho 
3 = X22 2a2/q 1¥2 4 2a,/4 2Y1 4a,a9/q Y1Y2; 
ay ag by + be 
and Y3 = *%1Yy2 44 ‘oy st “Y1Y2- 
q q 2q 


To ensure that we are always working with integers, the coefficients of x3 and y3 
must be integers. So this formula works if we can find integers gq and b3 for which q 
divides a1, a2, and fytbe | and the above four congruences hold simultaneously for 
integer bs. It is difficult to determine whether there is such a b3 for an arbitrary q, 

bi +b2 


but not so challenging if q = (a1, a2) divides “>. 


Corollary 12.7.1. For any given integers a,b,c,h,k we have 
(ab, hk, ch) - (ac, hk, bh) ~ (ah, hk, bc). 


Proof. We multiply (ab, hk, ch) and (ac, hk, bh) ~ (bh, —hk, ac) using the proof of 
Proposition [12.7.1] We take q = b so that a3 = ah and 2q|b, + bo = 0. Selecting 
bz; = hk we find that the congruences of Proposition [12.7.1] reduce to d = (hk)? 
(mod 4abh), which follows from d = (hk)? —4abch. Hence we have that (ab, hk, ch)- 
(ac, hk, bh) ~ (ab, hk, ch) - (bh, —hk, ac) ~ (ah, hk, bc). 

To get more symmetry in the statement of the result we note that (ah, hk, bc) - 
(bc, hk, ah) = 1, and so 


(ab, hk, ch) - (ac, hk, bh) - (bc, hk, ah) ~ 1. 


12.8. Dirichlet composition 


Dirichlet claimed that when he was a student, working with Gauss, he slept with 
a copy of Disquisitiones under his pillow every night for three years. It worked, 
as Dirichlet found a way to better understand Gauss’s proof of composition, which 
amounts to a straightforward algorithm to determine the composition of two given 
binary quadratic forms f and g of the same discriminant. 


Exercise 12.8.1. Given any primitive binary quadratic form f(z, y) € Z[x, y] and non-zero inte- 
ger A, prove that there exist integers r and s such that f(r,s) is coprime to A. Deduce that there 
exists a binary quadratic form g, for which f ~ g, with (g(1,0), A) = 1. 


Exercise 12.8.2. Suppose that f(x,y), F(X,Y) are two binary quadratic forms, with disc(f) = 
disc(F) (mod 2), for which f(1,0) = a is coprime to F(1,0) = A. Prove that there exist quadratic 
forms g = ax? + bry + cy? and G = AX? + bXY + CY? with the same middle coefficient, such 
that f~g and F~G. 


Now suppose we begin with two quadratic forms of the same discriminant. 
Let A be the leading coefficient of one of them. Then the other is equivalent to 
a quadratic form with leading coefficient a, for some integer a coprime to A, by 
exercise [12.8.1] Then these are equivalent to quadratic forms g = ax? + bry + cy? 
and G= AX?+bXY +CY’, respectively, by exercise [12.8.2] Since these have the 


460 Appendix 12A. Composition rules: Gauss, Dirichlet, and Bhargava 


same discriminant we deduce that ac = AC and so there exists an integer h for 
which 


g(x,y) = ax? +bry+ Ahy* and G(x,y) = Az? + bay + ahy’. 
Then 
H(m,n) = g(u, v)G(r, s) with H(a, y) = aAx? + bry + hy’, 
where m = ur — hus and n = aus + Avr + bus. 


Dirichlet went on to interpret this in terms of what we would today call ideals; and 
this in turn led to the birth of modern algebra by Dedekind. In this theory one is 
typically not so much interested in the identity, writing H as a product of g and G 
(which is typically very complicated and none too enlightening), but rather in how 
to determine H from g and G. Dirichlet’s proof goes as follows: 


The ideal J (eee. a) is associated to a given binary quadratic form az? + 


bry +cy? (see section [[2.10)of appendix 12B). Therefore when we multiply together 
g and G, we multiply together their associated ideals to obtain 


rat (") (ee), 
2 2 


which contains aA as well as both a- hive and A- apie Since (a, A) = 1 there 
exist integers r,s for which ar + As = 1 and so our new ideal contains 


—b+Vd —b+Vd  -b+vd 
a: —~——+s-A: = : 
2 2 2 
Therefore _ 
r=1(%4 aa) 


which is the ideal associated with the binary quadratic form H. 


Defining the class group. We now know that we can multiply together the 
values of any two quadratic forms of the same discriminant and get another. Since 
there are only finitely many equivalence classes of binary quadratic forms of a given 
discriminant this might seem to lead to a group structure, under multiplication. 
To prove this we will need to know that the usual group properties hold (most 
importantly, associativity), and also that the values of a binary quadratic form 
classifies the form. Unfortunately this is not quite true. In exercise we 
saw that the only issue in distinguishing between the values taken by forms is 
perhaps the values taken by ax? + bry + cy” and au? — buv + cv?. However there 
is an automorphism u = x,v = —y between their sets of values so they cannot be 
distinguished in this way. On the other hand, the ideals 


(4) and (4) 


2 


are quite distinct, and so multiplying ideals (and therefore forms) using Dirichlet’s 
technique leads one immediately to being able to determine a group structure. This 
is called the class group, since the group acts on equivalence classes of ideals (and 


12.9. Bhargava composition 461 


so of forms). In this approach, associativity follows easily, as multiplication of the 
numbers in the ideals multiply associatively, and it is similarly evident that the 
class group is commutative. Therefore the class group is a commutative group, 
acting on the ideal classes of a given discriminant, with identity element given by 
the class of principal ideals (which correspond to the principal form). 

We will now give a useful criterion to determine how to take square roots inside 
the class group. 
Proposition 12.8.1. If f is a binary quadratic form of fundamental discriminant 
d which represents the square of an odd integer, then there exists a binary quadratic 
form g of discriminant d for which g.g~ f. 


Proof. We begin by squaring the primitive form ax? + bry + acy”. Then 


2 
J:=!1 Gena 


contains a?,a- abt and (=btva)2 = a7 b(=btv4), Therefore J contains 


a: =btvd and b- =btvd Now (a,b) = 1 or else our original form was not primitive, 


and so J contains tid Therefore 


ra1(2 a) 


and the corresponding binary quadratic form is a?x? + bry + cy’. 
One can justify this by finding a suitable multiplication of forms, namely, 
(ar? + brs + acs”)(au? + buv + acu”) = a?a? + bry + cy’, 
where « = ru—csv and y = asu+ arv + bsv. 


Now if f represents a? with (a,d) = 1, then there exist integers b,c such 
that the quadratic form F := a?x? + bry + cy? is equivalent to f. Note that 
(a, b)? divides d = b? — 4a?c, which is a fundamental discriminant and so squarefree 
except perhaps a power of 2. However a is odd and so (a,b) = 1. Therefore we let 
g = ax? + bry + acy? so that, as in the previous paragraph g-g ~ F ~ f. 


12.9. Bhargava composition] 
Let us begin with one further explicit composition, a tiny variant on (12.7.3) (letting 
s — —s there): 
(au? + 2Buv + cv’) (ar* — 2Brs + cs”) = x? + (ac — B?)y? 
where x = aur + B(vur — us) — cvs and y= us + ur. 


Combining this with the results of the previous section suggests that if the discrim- 
inant d is divisible by 4 (which is equivalent to b being even), then 


(12.9.1) F(u,v)G(r, s)H(m, —n) = P(x, y) 


5 Although there is no Nobel Prize in mathematics, there is the Fields Medal, awarded every four 
years, only to people 40 years of age or younger. In 2014, in Korea, one of the laureates was Manjul 
Bhargava for a body of work that begins with his version of composition, as discussed here, and allows 
us to much better understand many classes of equations, especially cubic. 


462 Appendix 12A. Composition rules: Gauss, Dirichlet, and Bhargava 


where P(x, y) = 2? — 4y? is the principal form and z and y are cubic polynomials 


in m,n,r,s,u,v. Analogous remarks can be made if the discriminant is odd. 

In 2004 Bhargava came up with an entirely new way to find all of the triples 
FG, H of binary quadratic forms of the same discriminant for which (12.9.1) holds: 
We begin with a 2-by-2-by-2 cube, the corners of which are labeled with the integers 
a,b,c, d,e, f,g,h. 


41 | 


b 


Figure 12.2. Bhargava’s Rubik-type cube. 


There are six faces of a cube, and these can be split into three parallel pairs. To 
each such parallel pair consider the pair of 2-by-2 matrices given by taking the 
entries in each face, those entries corresponding to opposite corners of the cube, 
always starting with a. Hence we get the pairs 


_{@ 0) fe Ff\, _ faxtey bat fy 
Mi(#,9) = (° i) = (; i) — (“ + gy dx+hy)]’ 
{ae b d\ — fax+by ca+dy 
Ma(z, y) —_ (° ‘a+ . us ea fy gx + i ’ 
fa b c d\ — fax+cy b«e+dy 
ae (’ ') = (; i) a & +gy fat i) 
where we have, in each, appended the variables, x,y, to create matrix functions of 
x and y. The determinant, —Q,(z,y), of each M;(a, y) is a quadratic form in x and 
y. Incredibly Qi, Q2, and Q3 all have the same discriminant and their composition 


equals P, the principal form, just as in (12.9.1). We present two proofs. First, by 
substitution, one can exhibit that 


Q(z, —y) = Q2(x2, y2)Q3(3, ys) 
where 


v= (exw) (Sa) (G2) oma a= (es w(5 i) (32) 


Let’s work though an example: Plot the cube in three dimensions, take the 
Cartesian coordinates of every corner (each 0 or 1), and then label the corner 


12.9. Bhargava composition 463 


(x,y,z), with 272 + 2y+ z, squared. Hence 
a, b, Cc, d, €é, ne g, h = oe, 6°, OR 4 a5, ie d*, BF 
yielding the cube in Figure 12.3. 


et 


52 


a a 


Figure 12.3. The construction of three binary quadratic forms using Bhar- 
gava’s cube. 


0 4 


This cube leads to three binary quadratic forms of discriminant —7 - 4*: 
Qy = —4?(4x?+132y+11y?), Qo = —2?(x?—2ay+29y), and Q3 = 4?(82?+52y+y?). 
After some work one can verify that 
Qi(m,n)Qa(r, 8)Qz(u, v) = A(x? +. 4° - Ty”), 
where x and y are the following cubic polynomials in m,n,r, s, u,v: 


x = 8(—llmru — 38mrv + 25msu + 17msv — 17nru — 4nrv + 59nsu + 32nsv) 


and y=mru+mrv + 2lmsu + 5msvu + 3nru + 2nrv + 3lnsu + bnsv. 


Bhargava proves his theorem, inspired by a 2-by-2-by-2 Rubik’s cube. His idea 
is to apply one invertible linear transformation at a time, simultaneously to a pair 
of opposite sides, and to slowly “reduce” the numbers involved, while retaining the 
equivalence classes of Qi, Q2, and Q3, until one reduces to a cube and a triple of 
binary quadratic forms with coefficients having a convenient structure. 


Lemma 12.9.1. Jf one applies an invertible linear transformation to a pair of 
opposite sides, then the associated binary quadratic form is transformed in the usual 
way, whereas the other two quadratic forms remain the same. 


Therefore we can act on our cube by such SL(2, Z)-transformations, in each 


direction, and the three binary quadratic forms each remain in the same equivalence 
class. 


Proof. If ce ) €SL(2, Z), then we replace the face 


ca) Caer gam Ga) & Carts a) 


464 Appendix 12A. Composition rules: Gauss, Dirichlet, and Bhargava 


Then M(x, y) gets mapped to 


Ce aor G Abert a+ G a)epe 


that is, Mi(ax+-yy, bx + dy). Therefore the quadratic form Q1(x, y) gets mapped 
to Qi(ax+yy, Bx + dy) which is equivalent to Q1 (x,y). Now Mo(za, y) gets mapped 


to 
aateB8 ca+gf ba+ fB da+thB\ — fa B 
eae ae (aes ear (° Mo(2, y); 


hence the determinant, —Q2(x, y), is unchanged. An analogous calculation reveals 


that Ms(x,y) gets mapped to & A M3(ax,y) and the determinant, —Q3(2, y), 


is also unchanged. 


The previous lemma allows one to proceed in “reducing” the three binary qua- 
dratic forms to equivalent forms that are easy to work with (rather as in Dirichlet’s 
proof). 


Proof of the Bhargava composition. We will simplify the entries in the cube 
by the following reduction algorithm: 

e We select the corner that is to be a so that a 4 0. 

e We will transform the cube to ensure that a divides b, c, and e. If not, say 
a does not divide e, then select integers a, 3 so that aa + e3 = (a,e), and then let 
y = —e/(a,e), 6 =a/(a,e). In the transformed matrix we have a’ = (a,e), e’ = 0, 
and 1 < a’ < a—1. It may well now be that a’ does not divide b’ or c’, so we repeat 
the process. Each time we do this we reduce the value of a by at least 1; and since 
it remains positive this can only happen a finite number of times. At the end of 
the process a divides 6, c, and e. 

e We will transform the cube to ensure that b = c =e = 0. We already have 
that alb,c,e. Now select a=1, 6 =0,y = —e/a, 6 =1, so that e’ = 0,b' =b,c = 
c. We repeat this in each of the three directions to ensure that b= c=e=0. 

Replacing a by —a, we have that the three matrices are 


My (2, y) = & ) x + (; 2) y, so that Qi(2,y) = ada* + ahay + fgy", 


Ma(x,y) := ei ) wie ( y, so that Qo(x,y) = agx” + ahay + dfy’, 
d 


M3(a2,y) = fe ) a+ (; y, so that Q3(x,y) = afax? + ahry + dgy?. 
All three Q; have discriminant (ah)? — 4adfg, and we observe that 
Qil(fyers + grays + hysys, ax2x3 — dyoys) = Qo(®2, y2)Q3(X3, ys) 


where x1 = fy2x%3 + gx2y3 + hygy3 and y; = axex3 — dy2y3. 


This brings to mind the twists of the Rubik’s cube, though in that case one has 
only finitely many possible transformations, whereas here there are infinitely many 
possibilities, as there are infinitely many invertible linear transformations over Z. 


Appendix 12B. The class 
group 


12.10. A dictionary between binary quadratic forms and ideals 


In section [12.8] of appendix 12A we were reminded that we associate the ideal 
Iz(2a, —b+ Vd) = {2ax + (—b+ Vd)y : 2, ye Zh 


with the binary quadratic form ax? + bry + cy”, denoted [a,b,c]. To see why, note 
that if we multiply any element of the ideal with its conjugate, we obtain 


(Qax + (—b + Vd)y) - (2ax + (—b — Vd)y) = 4a(ax? + bay + cy?). 


We now investigate how the key operations of equivalence and composition translate 
into the ideal setting: 


e All equivalences of binary quadratic forms can be broken down into the two 
linear transformations x > «+ky, yoy, ke Z, anda -y, yo 2: 


— The linear transformation « > «+ky, y— y, k € Z converts the linear 
form to 


2a(x + ky) + (—b+ Vd)y = 2ax + (—B + Vad)y; 


that is, (2a,-b+V/d) = (2a, -B+V/d), where B = b—2ak. This new ideal 
corresponds to the binary quadratic form [a, B,c], which is equivalent to 
(a, b,c], via the inverse of this linear transformation. 


— The linear transformation x — —y, y — x converts the linear form to 
2a(—y) + (—b + Vd)a. If we multiply this through by b + Vd, we obtain 


(b+ Vad) - (2a(—y) + (—b + Vd)x) = —2a(b + Vd)y + (d— b*)x 
= —4dacr — 2a(b + Vd)y 
= —2a(2ca + (b+ Vad)y), 


465 


466 Appendix 12B. The class group 


so that (b+ Vd) - (2a,—b + Vd) = 2a- (2c,b+ Vd). This new ideal, 
(2c, b + Vd), corresponds to the binary quadratic form [c, —b, a] which is 
equivalent to [a,b,c], via the inverse of this linear transformation. 
It therefore makes sense to define: 
Ideals I and J in Q(Vd) are equivalent if there exist a, 8 € 
Q(Vd) such that al = BJ. 
We have shown that if f ~ g, then the ideals corresponding to f and g 
are equivalent. 


e In section [i2.8Jof appendix 12A we sketched Dirichlet’s proof that the compo- 
sition of two binary quadratic forms of the same discriminant corresponds to 
the product of their ideals. Although composition is a rather mysterious oper- 
ation, multiplication is not! We know that it is commutative and associative. 
We have seen that composition has a 1 (the principal form corresponding to 
the principal ideals) and has inverses, so that 


(2a, —b + Vd) - (2a,b + Vd) ~ (1). 


In Theorem we showed that the number of ideal classes (that is, the 
number of classes of ideals, modulo this equivalence relation) is finite, and 
therefore the ideal classes form a finite abelian group under multiplication, 
called the class group. 


Exercise 12.10.1. (a) Show that if a and b are represented by f of discriminant d, then ab is 
represented by the principal quadratic form of discriminant d. 
(b) Show that if a is represented by f of discriminant d, and b is represented by the principal 
quadratic form of discriminant d, then ab is represented by f. 
(c) Prove that if f(x,y) represents integers a, b, and c, then it represents abc. 


In section [12.5] we discussed Heegner’s Theorem exhibiting the nine fundamen- 
tal negative discriminants d < 0 of class number one. By the dictionary in this 
section this shows that there are exactly nine fields of the form Q(Vd) with d < 0 
in which there are only principal ideals in the ring of integers, R = Z[Vd]. In other 
words these are the only d for which R is a principal ideal domain; and therefore 
these are the only d for which Z|Vd] has unique factorization (as discussed in section 
[3.19] of appendix 3D). 


12.11. Elements of order two in the class group 


We saw the benefit of studying the subgroup of squares, and their cosets, when 
working with the group of reduced residues mod n. Similarly when G is the class 
group, we let G? := {g?: g € G} and H be the set of elements in G of order one 
or two; then G? and H form subgroups of G with G/G? & H. 


As [a, —0, c] is the inverse of [a, b, c] we deduce that [a, b, c] has order one or two 
if and only if [a,b,c] ~ [a, —b, c] where [a, b, c] is reduced, without loss of generality. 
We proved that reduced forms are inequivalent so this can only happen if b = 0 
(so that the forms are identical) or if [a,—b,c] is not reduced. Now if d < 0 and 
[a, b,c] is reduced, but not [a,—b,c], then either b = a, or b > 0 anda=c. One 
can verify that [a,b,c] ~ [a,—b,c] in each of these cases via the transformations 
Lr xr-y, yy, and x > —-y, y > @, respectively. Therefore we have a set of 


12.11. Elements of order two in the class group 467 


representatives for H: 


{[a, b,c] reduced of discriminant d: b= 0 or b=a or a=c}. 


When G is the group of reduced residues mod p, the Legendre symbol allows us 
to determine which coset of G? a given reduced residue belongs to. We now develop 
the same idea modified to this setting. Corollary [2.3.1] tells us that if N is coprime 
to d and is properly represented by a binary quadratic form f of discriminant d, 


then of(p) = (2) for every odd prime p dividing d. Now suppose that f and g 


are two binary quadratic forms of discriminant d, and let h = fg (that is, h is the 
composition of f and g, their product inside the class group). If m is represented 
by f and n is represented by g, then mn is represented by fg = h, and so 


onir) = (™) = (*) (4) = orton 


This implies that ¢ : G — {—1,1}", where r is the number of odd prime factors of 
d, is actually a group homomorphism. In particular if f = g so that h = f?, then 
on(p) = 1 for all p, and so h is in the kernel of the map; that is, G? C kero. In fact 
one can prove that G? is the kernel of the map, and therefore 


{o¢(p) : p\d} is isomorphic to H. 


If d is even, then the value of of(2) can be determined by the o(p) with p odd, as 
the following exercise shows us. 


Exercise 12.11.1. Use Corollary and quadratic reciprocity to prove that TI pia of(p) = 1. 


Exercise 12.11.2. Prove that if f ~ g, then of(2) = og(2) when d is even. 


Exercise [12.11 Jimplies that Im(o) € {(4p)pja € {-1, 1} : [I,)a5p = 1}, and 
one can prove that this is the image. 


If d is a discriminant for which G? contains only the identity element, then 
all elements of G have order one or two. Therefore there is a unique equivalence 
class of binary quadratic forms f of discriminant d for which of(p) = 6, for all 
primes p, for any given (dp),\¢ € Im(o). We can therefore determine exactly which 
equivalence class of binary quadratic forms represents each given prime p for which 
(d/p) = 1, using only Legendre symbols, by Corollary [12.3.1] These discriminants 
d are Euler’s idoneal numbers discussed in section [12.5 


Finally we prove a result in the converse direction to exercise [12.3.1] 


Exercise 12.11.3. Suppose that f = [a,b,c] and F = [A,B,C] are reduced, primitive binary 
quadratic forms of fundamental discriminant d < 0 for which of(p) = oF(p) for all primes p 
dividing d. Prove that f and F are equivalent over the rationals. (In other words, there exist 
a, B,y,6 € Q with ad — By = 1 for which F = f(ax+ By, yx + dy).) 


References for this chapter 


[1] J. W. S. Cassels, Rational quadratic forms, Dover, 1978. 

[2] Harvey Cohn Advanced number theory, Dover, 1962. 

[3] David A. Cox, Primes of the form x? + ny”, Wiley, 1989. 

[4] J.-P. Serre, A course in arithmetic, Graduate Texts in Mathematics 7, Springer-Verlag, 1973. 


Appendix 12C. Binary 
quadratic forms of positive 
discriminant 


12.12. Binary quadratic forms with positive discriminant, and continued 
fractions 


When d > 0, Gauss defined ax? + bry + cy? to be reduced when 
(12.12.1) 0< Vd—b <2lal < Vd+b. 


This implies that 0 < b < Vd so that jal < Vd and therefore there are only 
finitely many reduced forms of positive discriminant d. Since d— b? = 4ac we have 
ac = (b? — d)/4 <0, and 


0< Vd—b <2lc| < Vd+b, 


and so ax? + bry + cy” is reduced if and only if cx? + bry + ay? is. 
Forms f(x,y) = ax? + bry + cy” and g(x,y) = cx? + Bry + Cy? are neighbors 
(and equivalent) if they have the same discriminant and b+ B=0 (mod 2c). They 


are equivalent under the transformation (*) => 6 - @ where 6+ B = 2ck. 


Gauss’s reduction algorithm goes as follows: Given ax? + bry + cy? we select 
a neighbor as follows: Let b’ be the absolutely least residue of —b (mod 2c) so that 
\D'| <e. 
e If |b'| > Vd, then let B= 0’. 
Note that 0 < B? —d < c? —d. Therefore |C| = (B? — d)/4|c| < |c|/4. 
e If |b'| < Vd, then select B = b' = —b (mod 2c) with B as large as possible 
with B < Vd. Therefore B = —b + 2k|c| where k = [ay] and af := Bo 
Note that —d < B? —d=4cC <0. 


468 


12.12. Positive discriminant quadratic forms and continued fractions 469 


— If Qc] > Vd, then |C| < |d/4e| < |e. 
— Otherwise Vd > 2|c| and Vd—2|c| < B < Vd, and therefore 0 < Vd—B < 
2\c| < /d+ B, so that the neighbor is reduced. 


We repeat this algorithm until we obtain a reduced form. This is guaranteed to 
terminate since if f ~ g, then |cg| = |C| < |c| = |cr|; that is, the absolute value of 
the final coefficient of the binary quadratic form, which is an integer, is reduced at 
each step until the binary quadratic form is itself reduced. 

For example, the quadratic form [38, 101, 67] has discriminant 17. Here b’ = 33 
and so |b'| > Vd and therefore we have [38, 101,67] ~ [67, 33, 4]. For this next form 
b! = —1 and so |b'| < Vd. However 2\c| > Vd, so we are in the second case of the 
algorithm and [67, 33,4] ~ [4,—-1,—1]. For this form, b/ = 1 so that |b’| < Vd, and 
B = 3. Now as 2\c| < Vd we are in the third case of the algorithm and deduce 
that [4,-1,—1] ~ [-1,3,2], which is reduced. What happens when we apply the 
algorithm to [—1,3, 2]? Surprisingly we find that [—1, 3, 2] is equivalent to a different 
reduced form, [2,1,—2], of discriminant 17. This is quite different from the d < 0 
case when there was a unique reduced form in each equivalence class. We can 
repeat the algorithm obtaining [2, 1, —2] ~ [—2,3, 1], and then [—2, 3, 1] ~ [1,3, —2], 
followed by [1,3,—2] ~ [-2,1,2], [—2,1,2] ~ [2,3,—1], and finally [2,3,—1] ~ 
[—1, 3,2], where we get back to our original reduced form. This is known as a cycle 
of reduced forms and can be written as 


[aa 3, 2| ~ (2, 1, —2] pes [-2, 3, 1] ae (1, 3, —2] ag [-2, 1, 2] ~ (2, 3, =A jax [-1, 3, 2], 
or more succinctly as 
(12.12.2) ae Seg ge = Ope ee, 
We can guarantee that there are only finitely many forms in a cycle because there 
are only finitely many reduced forms of a given discriminant. 
In general if [ag, 09, a1] ~ [a1, 01, G2] ~ [a2, b2,a3] ~---, then we can write 


bo 


ao ay i: 


a2 ba AZ «+5 y 


always selecting each successive b,, by Gauss’s reduction algorithm and then letting 
Am+41 = (b2, — d)/4am. For another example, when d = 816, 


5 26 — 7 16 99 24 — 3 24 99 16 — 7 26 5 24 — 12 245 26 pas ae 
which is a cycle of period 8. 


Exercise 12.12.1. Prove that every reduced form of positive discriminant has a unique reduced 
neighboring form to the left and a unique reduced neighboring form to the right. 


We can explain this surprising feature of forms of positive discriminant by 
understanding a remarkable connection between Gauss’s reduction algorithm and 
the continued fraction for af = b+vd We showed that k = [af] and ag —k = 


2Ie| 
—B+VJd 
2Ic| 


, and so 


Gaipy _-B+vVd B+vd_d-B? _ 
f 9 Fel 2IC| lac] 


470 Appendix 12C. Binary quadratic forms of positive discriminant 


as B < Vd, which implies that a y = [k, ag]. Thus we can trace Gauss’s reduction 
algorithm by following the continued fraction algorithm. For instance, in our first 
example above we look at the continued fraction of 334-717 = [4,1,1,1,3]. Then, 
in the notation of appendix 11B, a = ag and a5 = a2 with 


_384+Vv17 14+ VI7 8417 + IT 83 + IT 
an: a i Ta a 2° 
The cycle length here is 3, not 6, since ag = as corresponds to the quadratic form 
[—1,3,2] as well as [1,3,—2]. The linear transformations that gives rise to each 
successive equivalence, in the cycle (2.12.2), are 


wana (Y) Go) G9) G3). G 9) G 3). 


The entry in the bottom right-hand corner of each matrix comes from the periodic 
part of the continued fraction for a. 


ao a1 a2 a3 Ad 


12.13. The set of automorphisms 


6 
ax? + ba +c, then Mz = z. Therefore yz? + (6 — a)z — 8 = 0 which implies that 
there exists an integer v such that 


If M = i A is an automorphism of ax? + bry + cy”, and if z is a root of 


y=av, 6-a=bv, —B=ww. 

Now if u=6+a, then 
u? = dv? = w? = (bv)? + 4(av)(cv) = (5 +.) — (6 — a)? — 478 = (ad — By) = 4 
and the automorphism takes the form 

u—bv _ 
(12.13.1) ( 2 is) ’ 

2 

We found all solutions to Pell’s equation u? — dv? = 4 in appendix 11B so we can 


determine all of the automorphisms of [a, b, c] from (12.13-1). In fact the solutions 


with u,v > 0 are all given by tasleawld = (utevd)n for n > 1 where u,v gives the 
utoVvd 
2 


smallest solution to Pell’s equation with 


_ = n 
Un sie =CUn a u 5 —cy 
Wn Hae ao au uct by 


Going back to our first example, the product of the six matrices in (12.12.3) is 
- 2 
( re oD , mapping [—1, 3, 2] to itself, which can be presented as minus (2.13.1), 
with a = —1,b = 3,c= 2 and u = 66, v = 16 (and u? — 17u? = 4). 
On the other hand the morphism of [a, b,c] to [—a, b, —c] has transformation 
given by the product of the first three matrices in ([2.12.3) which has transformation 


u—bv cu 
matrix ( be a) with a = —1,b = 3,c = 2 and u = 8,v = 2. Notice here 
2 


> 1. One can verify that 


that u? — 17v? = —4, a solution to the negative Pell equation, and 33 + 8/17 = 


(aay ity. 


Appendix 12D. Sums of three 
squares 


We have seen which integers are representable as the sum of two squares. What 
integers are representable as the sum of three squares? 


12.14. Connection between sums of 3 squares and /h(d) 


We may restrict our attention to integers that are not divisible by 4: Since the only 
squares mod 4 are 0 and 1, if n is divisible by 4 and is the sum of three squares, then 
all three squares must be even. Hence ifn = 4m, then to obtain every representation 
of n as the sum of three squares, we just take every representation of m as the sum 
of three squares, and double the numbers that are being squared. 


The only squares mod 8 are 0, 1, and 4. Therefore no integer = 7 (mod 8) can 
be written as the sum of three squares (of integers). By the previous remark no 
integer of the form 4*(8m-+7) can be written as the sum of three squares. Legendre 
proved that these are the only exceptions (we will not prove this here as all known 
proofs are too complicated for a first course): 


Theorem 12.5 (Legendre’s Theorem, 1798). A positive integer n can be written 
as the sum of three squares of integers if and only if n is not of the form 4*(8m+7). 


In section [11.2] we introduced Pell’s equation by asking for those triangular 
numbers that are also a square. The young Gauss was intrigued by which integers 
can be written as the sum of three triangular numbers; that is, 


afa+1) | b(+1) c(e+1) 
- 2 : 2 + 2 


We can multiply through by 8 and add 3 to get the equivalent equation 


8n+3 = (2a+1)? + (2b4+1)? + (2c+1)?. 


A71 


472 Appendix 12D. Sums of three squares 


On the other hand, if an integer = 3 (mod 8) is to be written as the sum of three 
squares, then those three squares must each be odd, and therefore we have this 
equation. In other words we have proved: n is the sum of three triangular numbers 
if and only if 8n + 3 is the sum of three squares. Legendre’s Theorem tells us that 
every integer = 3 (mod 8) is the sum of three squares, and so every positive integer 
is the sum of three triangular numbers. When Gauss discovered this in 1796 he 
wrote in his diary: 


EYPHKA! num =A+A+4+A. 


One might ask how many ways one can write an integer as the sum of three 
squares. Gauss proved the following remarkable theorem (for which there is still 
no easy proof): If n is squarefree, the number of representations of n as the sum of 
three squares is 


24h(—n) ifn=3 (mod 8), 
12h(—4n) ifn=lor2 (mod 8), n>1. 


Here h(—d) is the class number of Q(/—d). 

Confronted by such a surprising formula I typically verify the claim by comput- 
ing. In so doing I not only verified the above claim but found the following. I give 
it as an exercise but I do not know how to prove it using only elementary methods! 


Exercise 12.14.1./ If p is a prime and p = 8n +1 or +5, then there are exactly n solutions to 
p?=a?+b? +c? withl<a<b<e. 


12.15. Dirichlet’s class number formula 


In 1832 Jacobi was interested in the sum 
p-1 
k 
dp= k (<) ; 


for odd primes p. This is the sum of the quadratic residues mod p up to p, minus 
the sum of the quadratic non-residues mod p up to p. Calculations yield that 


J3 =—-1, J5 =0, Jp = -7, Jur = —11, Sis =0, Ji7 = 0, Jig = —19, Jog = —69, 


and patterns begin to emerge. The two most obvious are that J, = 0 if p = 1 
(mod 4), and that J, seems to be divisible by p if p= 3 (mod 4) and p > 3. 


Exercise 12.15.1. (a) Prove that Jp = 0 if p=1 (mod 4). 
(b) Prove that Jp is divisible by p if p is a prime > 3. 


For p = 3 (mod 4) and p > 3 we find that the values of jp := Jp/p are 


j7 = 1, fur = —1, fio = —1, joa = —3, j31 = —3, jaa = —1, Ja7 = —5, J59 = —3,.... 


Can you see any patterns and perhaps guess at the value? Write a program to get more data. 
At this point Jacobi did something unexpected: He computed the class number h(—p) for primes 
p =3 (mod 4) (see, e.g., section [12.4) and compared. Do you see a pattern now? 


12.15. Dirichlet’s class number formula 473 


In 1832 Jacobi conjectured that the class number h(—p), when p = 3 (mod 4) 
and p > 3, is given by —J,/p; that is, 


(12.15.1) h(—p) = . ¥ (=) n. 


In 1838 Dirichlet gave a proof of Jacobi’s conjecture and much more. His miraculous 
class number formula links algebra and analysis in an unforeseen way that was 
a foretaste of many of the most important subsequent works in number theory, 
including Wiles’s proof of Fermat’s Last Theorem. We will simply state the formulas 
here, which apply whenever d is a fundamental discriminant. If d > 0, then 


h(d) logea = Vd x (<). 


n>1 


Vd 1 f-d 
h(-d) = — eal 
It is not obvious that this is equivalent to where d = —p = 1 (mod 4), 
but this will be confirmed in section [14.8] of appendix 14A. 

We encountered the infinite sum on the right-hand side of these formulas in sec- 
tion [8-17] in connection with Dirichlet’s work on primes in arithmetic progressions. 
It is not obvious that this sum converges, but we will give a proof that it converges 
in section [13.7] Now h(d) > 1 for all d (positive or negative) since h(d) counts the 
binary quadratic forms of discriminant d, and one always has the principal form. 


Hence the formulas imply that 
1/d 
5 (5) > 
n\n 


n>1 


If —d < —4, then 


which was a crucial step in the proof that there are infinitely many primes in 
arithmetic progressions (see section|8.17). By taking h(d) > 1 in these formulas we 


obtain the lower bounds 
SS (=) 5 
Nn — JSd 


for —d < —4 and, when d > 0 then 


3 1 (<) > log €a > log d 
n\n Vd Wd 


n>1 


In section we saw that h(—d) = 1 for only a few values of d, the largest of 
which is 163. Dirichlet’s class number formula provides an explanation: We expect 


a lot of cancellation in our infinite sum, >,,., +(+#), so its value should be neither 


too large nor too small. Therefore we expect h(—d) to typically be close to Vd 
(that is, within, say, a factor of 10), and so h(—d) should be much larger than 1, 
once d is sufficiently large. 

How about h(d) when d > 0? We again expect a lot of cancellation in our 
infinite sum, 7,5, +(4), so that we expect h(d) logeg to be around Vd. One of 


474 Appendix 12D. Sums of three squares 


h(d) and logeg could be small and the other large. As we mentioned in section 
of appendix 11B, calculations suggest that logeg is usually around V/d so 
that we expect h(d) to typically be “small”, perhaps even bounded by a constant. 
However there are values of d for which €g is small. For example, if d = m? +1 is 
squarefree, then ¢g = m+ V4, so for these discriminants we expect that h(d) grows 
like Vd/logd with d. 


How often is h(d) = 1 when d > 0? Here is a list of the fundamental discrimi- 
nants d,1< d < 100, for which h(d) = 1{9 
5,8, 12, 13, 17, 21, 24, 28, 29, 33, 37, 41, 44, 
53, 56, 57, 61, 69, 73, 76, 77, 88, 89, 92, 93, 97, 
that is, 26 of the 30 fundamental discriminants up to 100, a surprisingly high 
proportion. We believe that h(p) = 1 for roughly 3 out of every 4 primes p = 1 
(mod 4), but this is an open conjecture, and we do not even know how to prove 
that there are infinitely many fundamental discriminants d, never mind primes, for 
which h(d) = 1. 
Exercise 12.15.2. Let S := ee (4) and T := yeaa (2) n. 
(a) Show that S = 0 when p=1 (mod 4). Henceforth assume that p = 3 (mod 4) with p > 3. 
(b) Note that (2) (p—n) = () (n — p). Use this to evaluate the sum 7?=} () n in 


p-l 
za: 


terms of S and T by pairing up the nth and (p— n)th term, for n = 1,2,..., 
(c) Do this taking n = 2m, m=1,2,..., pot to deduce from (12.15.1) that 


an = te (2). 


~® ub 


Exercise 12.15.3. Let p be an odd prime and x = pot) (mod p). In exercise [7.4.3] we showed 
that 2? = — (3) (mod p). 
-1 
(a) Prove that 2? = (-1)% (mod p) where N = i ( — (£)) (mod 2). 
ow that if p = 3 (mod 4), then h(—p) is odd an =a —p)+1) (mod 2) using 
b) Sh hat if d hen h dd and N 5 h d 
exercise [12.15.2[c). 


h(=p)+1 
(c) Deduce that if p = 3 (mod 4), then po) =(-1) z 


(mod p). 


®Here we are looking at the class number of Q(Vd) as discussed in appendix 12B. For a fundamental 
discriminant D we take d = D/(D,4), so if d = 2 or 3 (mod 4) in our list, then this means that there 
is just one equivalence class of binary quadratic forms of discriminant 4d, namely the principal class. 


Appendix 12E. Sums of four 
squares 


12.16. Sums of four squares 


Next we discuss a beautiful and surprising theorem that Euler tried, and failed, to 
prove for many years. 


Theorem 12.6 (Lagrange’s Theorem). Every positive integer is the sum of four 
squares. 
Proof. We start from the identity 


(12.16.1) (a? +b? +c? + d?)(u? + u?-+w? + 2?) = A? +B? +07 + D?, 
where A:=au+bv+cw+dz, B:= —av+bu—cxr+du, 
C :=-aw+ba+cu—dv, D:=—-axr—bw+cv+du. 


Much as for the sum of two squares, if we can show that every prime is the sum of 
four squares, then every integer, written as the product of its prime factors, is the 
sum of four squares by repeated applications of (12.16.1). 

Now 2 = 12+ 17+ 0? + 0? so we focus on odd primes p. There exist integers 
a,b, c,d, not all zero, such that a? + b? + c? + d? =0 (mod p) by exercise [8.9.9{b) 
(and taking d = 0). We select a,b, c,d so that mp = a? + b? + c? + d? where m is 
the minimal such integer > 1. Our goal is to show that m = 1. 


Exercise 12.16.1. Prove that we may take |al, |bJ, |c|,|d| < p/2, so that m < p. 


Exercise 12.16.2. Show that if m is even, then we can reorder a, b,c,d so that a — b and c—d 
are both even. Using the identity 


Ca SY SY HY dererere 


prove that m must be odd. 


476 Appendix 12E. Sums of four squares 


Let u,v, w,x be the absolutely least residues of a,b, c,d (mod m), respectively. 
Therefore u? + v2 + w? + 2? = a? +b? +c? +d? = 0 (modm). Moreover 
Jul, |v, |wl, || < m/2 (since m is odd), and so u? + v? + w? 4+ 2? < 4(m/2)? = m?. 
Hence we can write u? + v? + w? + 2? = mn for some integer n < m. 


Exercise 12.16.3. Prove that A= B=C=D=0 (mod m). 


Since A/m, B/m,C/m, D/m are integers by the last exercise, we obtain 
2_ (a+b? +c? +d?) (w+? +? +27) 
7 m m 


(A/m)* + (B/m)* + (C/m)? + (D/m) 
= np. 
This contradicts the minimality of m unless n = 0 in which case u =v = w= 2 =0 


so that a= b=c=d=0 (mod m) and then m?|a? +b? +c? +d? = mp. Therefore 
m =1asm < p, which is what we wished to prove. 


Exercise 12.16.4. Prove that no integer of the form 2?*+! is the sum of three or four positive 
squares, and the only such representation of 2?” is (2*—1)2 + (2*—1)? 4 (2K-1)2 4 (Qk-1)2, 


12.17. Quaternions 


The identity (a? +?) (u?+v?) = (27+ y?) where x = au—bv and y = av+bu, which 
was used in the result about representations by the sum of two squares, is perhaps 
most naturally obtained by taking norms in the product (a+ ib)(u + iv) = a + iy 
where i? = —1. The identity (1216-1) can be obtained by taking norms in the 
quaternions. This is a non-commutative ring. There are three square roots of —1 
which do not commute with one another, specifically: We let 7,7,k be “imaginary 
numbers” such that 


PSPpar a1, 
ij=k=—ji, jk=i=—kj, ki=j=—ik. 
The quaternions are the ring {a+ bi+cj +dk: a,b,c,d € Z}. Now 
Norm(a + bi + cj + dk) := (a+ bi + cj + dk) (a — bi — cj —dk) = 0? 4+ P4H+C4@? 
since 17 + jt =gk+kj =ki+ik =0. Moreover 
(a+ bi+cj + dk)(u—vi-wj — xk) = A+ Bi+Cj+ Dk 

with A, B,C,D defined as in (2.16.1). Our proof that primes can be written as 
the sum of four squares can now be translated into the language of quaternions. 


12.18. The number of representations 


Let r4(n) be the number of representations of n as a sum of four squares. In 1834 
Jacobi showed that r4(n) = 80(n) if n is odd, and r4(n) = 240(m) if n is even, 
writing n = 2*m, k > 1, where m is odd and o(n) = Yaln 4 


Exercise 12.18.1. Prove that this can rewritten as follows: 


4 
2 dat x” 4a4™ 
So ra(N)a® = ye =8 > 18 >, 1272 » 1 — 74m)2 | ° 
ew Ye ) 


12.18. The number of representations ATT 


Exercise 12.18.2. (a) Show that if 8|N and N = a? +06? +c? 4+d?, then a,b,c,d are all even; 
and so deduce that r4(2*m) = r4(2*-?m) if k > 3. 
(b)* Prove that if m is odd, then r4(2*m) = 3ra(m) for all k > 1. 


We now prove that r4(n) = 80(n) when n is odd. The result for n even follows 
from exercise |12.18.2 


We saw that r2(0) = 1 and 


ro(n) =4~ (= =4 Yo 1- Sb} 1 
d|n ab=n 


ab=n 
a=1 (mod 4) a=—1 (mod 4) 


for all n > 1 (writing a = d and b = n/d), so that 


ra(n) —2re(n)= S> re(m)ro(M) = 16 Sy 1- YO oil 


m+M=n ab+AB=n ab+AB=n 
m,M>1 a=A (mod 4) a=— A (mod 4) 
a odd a odd 
= 16 Yo 1- Yi 
ab+AB=n ab+AB=n 
a=A (mod 4) a=—A (mod 4) 


Here we write m = ab and M = AB and apply the above formula for r2(m) and 
r2(M). We have added into both sums the terms for which a is even, since then 
the condition a = A (mod 4) is the same as the condition a = —A (mod 4). 


We split the first sum into three parts: If a > A, write a= A+ 4c with c> 1, 
so we have n = ab+ AB = (A+ 4c)b+ AB = Ad+ 4cb with d= b+ B> band Ad 


odd. Therefore 
ys) tS ea 


ab+AB=n ad+4bc=n 
a=A (mod 4) d>b 
a>A 
We take the same approach and get the same quantity if a < A. If a = A, then 
we have n = ab+ AB = a(b+ B). This is then a sum over divisors d of n, writing 
a = n/d, so there are d — 1 choices of pairs b+ B = d. This yields, in total, 
Lage l= a(n) — T(n) possibilities. Therefore, changing variables, we have 


x 1=2 S- 1+ a(n) —7(n). 


ab+AB=n ab+4AB=n 
a=A (mod 4) a>A 


We also split the second sum into three parts: If b > B, write b= B+C so we have 
n=ab+ AB=a(B4+C)+AB= (a+ A)B+aC =4dB+aC 

as a+A=0 (mod 4), with 4d > a, analogously if b < B. If b = B, then we have 

n = ab+ AB = (a+ A)B. This though is impossible as 4 divides a + A, which 

divides the odd n. Therefore, changing variables, we have 


yx t=). 32 ot 


ab+AB=n abt+4AB=n 
a=—A (mod 4) b<4B 


478 Appendix 12E. Sums of four squares 


Now if ab+4AB =n with a < A, write A =a-+c to have 
n=ab+4(a+c)B=a(b+ 4B) +4Bc = ad+4Bc 


where d= b+4B > 4B, and so 


S de pe 


abt+4AB=n ab+4AB=n 
a<A b>4B 


Subtracting these quantities from the total number of solutions to ab + 4AB =n 
in positive integers, we get 
a oS 


abt+4AB=n abt+4AB=n 
a>A b<4B 


We cannot have b = 4B or else 4 divides n, so combining the above equations gives 
that 


ra(n) — 2re(n) = 16 | a(n) — T(n) — 2 S- 1 
ab+4AB=n 


a=A 


If ab + 4AB =n with a = A, then n = a(b+ 4B), and so if d|n with a = n/d, then 


there are 
a=1) @=2 . 1/-1 
4 A 4\ d 
possibilities for b+ 4B = d, as d is odd. Therefore 
= ~d 2 1 ~ 1 _ 1 1 5 ro(n) 
S- oe 7 +3 (=F) = jo 57(n) + 16 


ee n d\n d|n 


a=A 


Substituting this in above we obtain 
ra(n) = 80(n), 


as desired. 


Exercise 12.18.3. (a esd the identity 
A(a? +b? +c? +d?) = (a te+d)*+(a+b—c—d)? +(a—b+c—d)*+(a—b—c+d)’. 

(b)t Suppose that m = n ee 2). Prove that there exist integers a,b,c,d for which n = 
a? +b? +c? +d? with m=a+b6+c-+d if and only if 4n — m? can be written as the sum 
of three squares. 
Henceforth let n be odd. 

(c) Deduce that for every odd integer m with |m| < 2,/n there exist integers a, b,c,d for which 
n=@4+04+24d? andm=a+b+cH+d. 

(d) Show that there is a 1-to-1 correspondence between solutions to 4n — m 
with u = v = w =m (mod 4) and solutions to n = a? +b? +c? 4+ d?, 
with a #b=c=d (mod 2). 

(e) Deduce that there is a 2-to-1 correspondence between solutions to 4n —m 
and solutions to n = a? + b? +c? +d?, m=a+b+c+H+d. 

(f) Using Gauss’s result mentioned at the end of section[[2.14]of appendix 12D, deduce that if 
n is odd, then there are 12h(m? — 4n) solutions to n = a? +b? +c? +d?, m=at+b+c+d. 


) 
( 


2a ya yrty 


2 


=w+u24+w 


Appendix 12F. Universality 


12.19. Universality of quadratic forms 


A universal quadratic form represents all positive integers. We have seen that 
x? +y? + 2? + w? is universal, while x? + y? + 2? is not. One can show that no 
binary quadratic form is universal as a consequence of Proposition [12.3.1/i). Using 
a suitable generalization of the local-global principle (in the form of the Hilbert 
symbol—see appendix 9B) one can show that no positive definite ternary quadratic 
form is universal, though it is easy to show that x? — y? + 22? is universal. 


Lemma 12.19.1. No positive definite diagonal ternary quadratic form ax? + by? + 


cz? is universal. 


Proof. Assume 1 < a < b < c without loss of generality. To represent 1 we must 
have a = 1. To represent 2 we therefore need that b = 1 or 2. The quadratic form 
x? + y? + cz? does not represent 3 if c > 3, and it does not represent 7, 14, or 6 if 
c= 1, 2, or 3, respectively. The quadratic form x? +2y? +cz? does not represent 5 if 
c > 5, and it does not represent 7, 10, 14, or 10 if c = 2, 3, 4, or 5, respectively. 


Let [a, b,c, d] denote the quadratic form ax? + by? + cz? +dw? with a,b,c,d > 0. 
In 1916 Ramanujail] asserted that the following quaternary quadratic forms are 


“Srinivasa Ramanujan was a self-taught Indian mathematician who sent letters with his discoveries 
but no proofs to prominent mathematicians, most of whom largely ignored him. In 1913 he wrote to 
G. H. Hardy, a professor at Trinity College, Cambridge, who was impressed by some of these formulas, 


including cae er ee ; 
1-5(()) +9(5()) -13(+(5)) fue, 


Although Hardy could not prove this, nor several of Ramanujan’s other results, he figured “they must 
be true, because ... no one would have the imagination to invent them.” Hardy brought Ramanujan 
to Cambridge, England, in 1914. Ramanujan spent the next five years proving and publishing great 
theorems, in a wide variety of subjects, under Hardy’s guidance (several of which will appear in the 
next few chapters). Sadly, Ramanujan died in 1920 of tuberculosis at the age of 32, having suffered 
several health problems in the damp English climate. Most of the results in his notebooks, stated there 
without proof, were only proved and published many years later. We are still not sure of the intuition 
that led him to many of his results; researchers continue to try recreate and appreciate his thinking 
up to the present day. The world’s leading prize for a young number theorist is the Sastra-Ramanujan 


479 


480 Appendix 12F. Universality 


universal: [1,1,1,4], [1,2,2,4] forl<k< 7; [1,1,2,k], [1,2,4,k] forl<k< 14; 
(1,1,3,k] forl <k <6; [1,2,3,k], [1, 2,5, &] for 1 < k < 10; though this is not quite 
true for [1,2,5,5], that is, x? + 2y? + 52? + 5w?, since it represents every positive 
integer except 15. This is a lovely theorem but difficult to state or remember. 
In 1938 Halmos observed that a much simpler and more memorable way to recall 
Ramanujan’s result is to note that ax? + by? + cz? + dw? is universal if and only if 
it represents every positive integer < 15. 


It is of interest to classify all universal positive definite quadratic forms in no 
matter how many variables. In 1993 Conway observed that Halmos’s observation 
extends to this question. 


Theorem 12.7 (The Fifteen Criterion, I). Suppose that f is a positive definite 
diagonal quadratic form. Then f represents all positive integers if and only if f 
represents all positive integers < 15. 


Proof assuming Ramanujan’s Theorem. Suppose that f = a,77+a2x%3+-+-+ 
ian with 1 < a, < ag <--- < aq represents all positive integers. Since f represents 
1 we must have a; = 1. Since f represents 2 we must have ag = 1 or 2. If 
a, = dg = 1, then, since f represents 3 we must have ag = 1, 2, or 3. If ay = 1, 
a2 = 2, then, since f represents 5 we must have a3 = 2, 3, 4, or 5. Now 


ge ok + x3 represents m, 1<m<6,but not 7, and so 1 < a4 < 7; 


xy xe 2x3 represents m, 1 <m < 13,but not 14, and so 1 < aq < 14; 


x xe 323 represents m, 1<m<5,but not 6, and so 1 < a4 < 6; 
a} + 2x5 + 2x3 represents m, 1 < m < 6,but not 7, and so 1 < a4 < 7; 


ey + 2a5+ 363 represents m, 1<m<9,but not 10, and so 1 < aq < 10; 


a} + 2x5 + 4x3 represents m, 1 < m < 13, but not 14, and so 1 < a4 < 14; 


ry 2x2 5a represents m, 1<m<9,but not 10, and so 1 < a4 < 10. 


Ramanujan’s result implies that, for these possibilities for a ,a2,a3,a4, the qua- 


dratic form a,x} ax? a3x3 aan? represents every positive integer except 


perhaps 15, and the result follows. 


Actually one only needs to verify that 
1,2,3,5,6,7,10,14, and 15 
are represented; that is, f represents all positive integers if and only if f represents 
each of 1, 2, 3, 5, 6, 7, 10, 14, and 15. Conway and Schneeberger observed that this 


result generalizes rather nicely in that one can decide whether any positive definite 
quadratic form with even cross coefficients is universal in much the same way: 


Theorem 12.8 (The Fifteen Criterion, II). Suppose that f is a positive definite 
quadratic form, which is diagonal mod 2. Then f represents all positive integers if 
and only if f represents 1, 2, 3, 5, 6, 7, 10, 14, and 15. 


prize, presented only to researchers who are no older than Ramanujan was, at the time of his death. 
Ramanujan’s story is well told in the 2015 movie The man who knew infinity. 


The 290 criterion 481 


This is sharp in the sense that for each integer m in the list there is such a 
quadratic form that represents every integer except m. For example, we already 
saw that «7 + 2y? + 52? + 5w? represents every positive integer other than 15, and 
2a? + 3y? + 42? + 5w? represents every positive integer other than 1, etc. 


This was extended to all quadratic forms by Bhargava and Hanke: 


Theorem 12.9 (The 290 Criterion). Suppose that f is a positive definite quadratic 
form. Then f represents all positive integers if and only if f represents all positive 
integers < 290. 


This is sharp since x? + ry + 2y? + az + 42 + 29(a? + ab+ b?) represents every 
positive integer other than 290. In fact f represents all positive integers if and only 
if it represents the 29 integers 


1,2, 3,5,6,7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 
30, 31, 34, 35, 37, 42, 58, 93, 110, 145, 203, and 290. 


This set is also minimal in that for each integer m in the list there is a positive 
definite quadratic form that represents all positive integers except m. 


The Fifteen Criterion has been generalized to representations of any set S of 
positive integers by Bhargava: There exists a finite subset T' of S such that a positive 
definite quadratic form f, which is diagonal mod 2, represents every integer in S if 
and only if it represents every integer in T. Hence Bhargava has reduced any such 
classification problem to a finite problem. For example, such an f represents all 
primes if and only if f represents the 15 primes up to 47 as well as 67 and 73. 


Here are a few nice examples that have been worked out explicitly: 


e (Bhargava) A positive definite quadratic form, which is diagonal mod 2, rep- 
resents every odd positive integer if and only if it represents 1, 3, 5, 7, 11, 15, 
and 33 


e (Rouse, under certain mild assumptions) A positive definite quadratic form 
represents every odd positive integer if and only if it represents every odd 
integer < 451. 


e (Crystel Bujold) A positive definite ternary quadratic form, which is diagonal 
mod 2, represents every odd positive integer if and only if it represents 1, 3, 
5, 7, 11, and 15. 


In 2015, Kenneth Williams wrote a delightful article in which he observed some 
further such results: If a,b,c > 0, then ax? + by? + cz? represents all odd positive 
integers if and only if it represents 1, 3, 5, and 15. Moreover ax? + by? + cz? 
represents all positive integers = 2 (mod 4) if and only if it represents 2, 6, 10, 14, 
and 30. 


References for this appendix 


[1] Manjul Bhargava and Jonathan Hanke, Universal quadratic forms and the 290-Theorem (preprint). 
[2] John H. Conway, The sensual (quadratic) form, Carus Mathematical Mongraphs (26), 1997. 


[3] Kenneth S. Williams, A “four integers” theorem and a “five integers” theorem, Amer. Math. 
Monthly 122 (2015), 528-536. 


Appendix 12G. Integers 
represented in Apollonian 
circle packings 


Sarnak studied the set of curvatures of circles surrounding a fixed circle in any 
given integer Apollonian circle packing (see section [9.15] of appendix 9D). We will 
suppose we are given integers a, b, c, d satisfying 2(a? +b? +c? +d?) = (a+b+c+d)?, 
so that a+b+c+d is even. We fix the circle of curvature a and make the change 


of variables 
a+b+c-—d 
z=a+b, y=a+c and a al 
so our equation becomes 
a? + z*— ay =0. 
The transformations of (a, b,c, d) given by b > 2(a+c+d)—b, c > 2(a+b+d)—c, 
and d + 2(a+ b+ c) —d become the transformations of (x, z, y) given by 


x x 
zz} oM[ez 
y Y 
for M = My, Mz, Ma, respectively, with 
1 -4 4 1 0 0O 1 0 0 
My:=|0 -1 2], M.e:= {2 -1 0}, and Mg:=|{0 -1 O 
0 0 1 4 -4 1 0 0 1 


12.20. Combining these linear transformations 


We wish to determine which matrices are given by the products of the M,, M., Mg 
taken arbitrarily often in arbitrary order, or at least to find a large and interesting 
subset of the set of products. Sarnak found a way to fit this question into known 
work on matrices by making an extraordinary observation (which appears in rather 


482 


12.20. Combining these linear transformations 483 


different form in Gauss’s work): There is a map from 2-by-2 matrices to certain 
3-by-3 matrices given by 

2 2a 2 

l a Ms ay 

oi ad — By B? 286 52 


which preserves multiplication though reverses the order, p(AB) = p(B)p(A). The 


images of 
1 -2 1 0 
Grven) 


under the map p yield the matrices MyM, and MaM,, respectively. 


These two matrices, together with —J (and note that p(—JI) = I) generate the 
congruence subgroup, 


— t{o B®). es a BY _ 
G:= {(° ) : a, B,y,6 €Z, ad — By =1, and (° , =I (mod ah. 
We prove this by something akin to the Euclidean algorithm: We begin with a 


matrix (: ;) € G and we will post multiply this matrix by other elements of G 


so as to reduce the pair of integers a,b. If |a] > |b| > 0, let A be the absolutely 
least residue of a (mod 2|b|) so that A = a — 2kb for some integer k. Then 


a b\ {1 0\* fa b\f 1 0\_ (fA b 
ec d)\-2 1) ~~ \e dj/\-2k 1) \C d 
where C = c — 2kd, which also belongs to G (as G is a group and so closed under 
multiplication). What is important is that |A| < |]. 
Similarly if |b] > |a| > 0, then we write B = b — 2ka and post-multiply by 
(; a) , and so we replace b by B in the matrix, where |B| < |a|. This algorithm 
terminates in a finite number of steps, when either |a| = |b| or one of a and b is 0. 
But a is odd and 0 is even, and so b = 0 and ad = 1 since the determinant is 1, 
implying that eithera =d=1ora=d=~—1. Ifa=d=-—1, we multiply through 


c/2 
by —I and so we have a = d= 1. Since c is even we post-multiply by (2 ') 


to obtain the identity. The result is proved. 


We have therefore proved that the matrices generated by MyM, and M,M, all 
take the form 


a? Qay—— B 
af ad+ Phy 7d] where é fy EG. 
6B 265 & . 


We wish to extend this to the group generated by My, M., Ma, which means we 
have to take into account the effect of Mg only, as M7 = I. Now if we pre-multiply 
one of our matrices by Mg, this has the same effect as changing the signs of a and 
yy; and if we post-multiply one of our matrices by Mga, this has the same effect as 
changing the signs of a and 3. Both cases are accounted for simply by replacing G 


484 Appendix 1 


2G. Integers represented in Apollonian circle packings 


by the group 


mace 


y oO 


): a, 6,7,6 € Z, ad — By =1 or —1, and (° = moa2}. 


Tracing back through the argument above, we find that a new triple of circles 


mutually tangent with each 
C, and D given by 


other and the one of curvature a have curvatures B, 


2 2 


B 1 0 O ay 2ay 7 a+b a 

D)={1 -2 1 ab ad+Py 76 (a+b+c—d)/2]—la 

C 0 O 1 B? 286 6? ate a 
for any é a € H. We deduce that B equals 


(a? +ay+7 


—l)a+ (a? +ay)b+ (ay +7)e- ayd, 


C the same with a,y replaced by 8,6, and D the same with a,y replaced by 


a— £,y —6, respectively. M 


odulo 2, these are all different pairs. 


Now consider the quadratic polynomial 
f(m,n) = (m? +mn 4+ n? —1)a + (m2 + mn)b+ (mnt n?)e— mnd, 


when m and n are coprime i 


ntegers. 


(i) If m is odd and n is even, let a = m and y = n so there exist integers 7, 6 for 


which ad — 7: 2y = 1, 
f(m,n) =B. 


and let @ = 27. Thus we have an element of H and 


(ii) If m is even and n is odd, let 6 = m and 6 = n so there exist integers 7, a@ for 


which ad — 7-28 = 1, 
f(m,n) =C. 


(iii) If m and n are both odd 


and let y = 27. Thus we have an element of H and 


, then there exist integers r,s for which mr —ns = 1. 


If r is odd, then s is even and we let 6 = r,8 = s; if r is even, then we 


let d6 =r+n,6=s8 


a=B+m,y=d+n. T 


+m. Either way, 6 is odd and 6 is even. We let 
hus we have an element of H and f(m,n) = B. 


Therefore the set of curvatures of the circles generated in this way is precisely 
the set of values of f(m,n) as m and n vary over coprime integers. We can write 
f(m,n) = g(m,n) — a where 


g(m,n) = (a+ 
a binary quadratic form of 


+ b)m? + (at+b+c—d)mn+(a+e)n’, 
discriminant —4a?. Note that (a + b)g(m,n) = M? + 


(an)? where M = (a+ b)m4 


In our 21, 24, 28 example this leads to the values of 45m? 


Letting m — m—1n gives 45 


LC cothre=@) a 


84mn + 49n? — 21. 


m? + 6mn + 10n? — 21. 


In 1974 Iwaniec proved that any quadratic polynomial in two variables m and 
n, subject to certain obvious necessary conditions, takes on infinitely many prime 


values 


with (m,n) = 1 provided (a, b 


circle of curvature a in our p 


8Twaniec does not insist that ( 


In our case this means that f(m,n) takes on infinitely many prime values 


,c,d) =landa+#0. We therefore deduce that for any 
acking, there are infinitely many circles in the packing 


m,n) = 1 but this restriction can be incorporated into his proof. 


Apollonian packings 485 


that are tangent to a and have prime curvature. Take any one of those circles of 
prime curvature and make that the new value for a. Then the same calculation 
reveals that there are infinitely many circles of prime curvature in the packing that 
are tangent to our original circle of prime curvature. We have therefore proved the 
Apollonian twin primes conjecture. 

Exercise 12.20.1. Suppose that we are given four mutually tangent circles with integer radii, 
and create an Apollonian circle packing from them. Prove that there is an infinite chain of distinct 


circles in the packing with prime curvatues pi < po <--- where the circles of curvatures pm and 
Pm-+1 are mutually tangent, for every m > 1. 


Further reading on Apollonian packings 


[1 


Elena Fuchs, Counting problems in Apollonian packings, Bull. Amer. Math. Soc. 50 (2013), 229- 
266. 


[2] Alex Kontorovich, From Apollonius to Zaremba: Local-global phenomena in thin orbits, Bull. Amer. 
Math. Soc. 50 (2013), 187-228. 


[3] Peter Sarnak, Integral Apollonian packings, Amer. Math. Monthly 118 (2011), 291-306. 


Chapter 13 


The anatomy of integers 


One studies an organism’s anatomy by breaking it up into its smallest possible 
meaningful indecomposable components. In biology that is the DNA, and then 
a subject is easily identified as each has unique DNA. For integers we study their 
prime factors: Every integer may be broken up in a unique way into primes, and ev- 
ery given set of primes (including repetitions) gives rise to a unique integer. Similar 
remarks might be made about polynomials (the components being the irreducible 
polynomials), and even permutations, which can be partitioned up into cycles. 

In this chapter we study what the prime factors of a typical integer n look like: 
their quantity, their size, how they combine to make divisors of n, etc.; and we 
discuss several natural consequences of these musings. 


13.1. Rough estimates for the number of integers with a fixed number 
of prime factors 


The prime number theorem (as discussed in section[5.4) tells us that there are about 
x/log x integers up to x with exactly one prime factor. In other words, a proportion 
of 1 in log of the integers up to x is prime, which is a vanishing proportion as 
x — oo. So primes are rare. One can use this approximation to estimate the 
number of integers up to x that are the product of two primes: If pq < x, where 
p< qare prime, then p < \/x and p < q < x/p, and so the number of such integers 


1s 
Dy So 1= SO ((a/p) - r(p) +1). 


pve pSaqsx/p PLV@ 
p prime 4q prime p prime 


Using the prime number theorem on the first term in this last sum, this is roughly 
x/p eS x 1 
which is & = 

2: rare 


l ’ 
pia 08 «/P pLVa ss 


p prime p prime 


487 


488 13. The anatomy of integers 


as ¢logx < log x/p < log x for primes p < /z. This last sum is about log log x, by 
(5.12.4). The final two terms in the above sum add up to 


< S < ce 
pais! (p) _ 1 (/2) = (log x)? 
pSJfx 
p prime 


for some constant c, by (5.5.1). This is much smaller than the other term. Com- 
bining these estimates (and being more precise, though at the cost of quite a few 
details) one can prove that there are 


x 


~N 


log log x 
log x 


integers up to x that are the product of two primes. 


One can further develop this argument to prove by induction that, for any fixed 
integer k > 1, N(a,k), the number of integers up to x with exactly & prime factors 


is 
xz  (loglogx)*-1 
log x (k — 1)! 


In other words, the probability that a randomly chosen positive integer < x has 
exactly k =m +1 prime factors, 1£N(x,m +1) ~ e~*A™/m! where A = log log z. 
This looks much like the Poisson distribution, from probability theory, with mean 
log log x. If this formula holds for all m (which can be allowed to grow as x grows), 
then we can deduce that a typical integer has around loglogx prime factors. It 
is hard to prove that this formula (or something like this formula) holds for all 
relevant m, but we can establish that a typical integer has around log log x prime 
factors by different, simpler methods: 


Exercise 13.1.1. Use Stirling’s formula (exercise[4.15.1) to show that if m is the nearest integer 
to A, then e~*’™/m! is roughly 1/27. This suggests the fact that if m is the closest integer 
to log log x, then there are roughly 2/,./27 log log x integers up to x with exactly m prime factors. 


13.2. The number of prime factors of a typical integer 


The number 12 = 2? x 3 has two or three prime factors depending on whether one 
counts the 2? as one or two primes. So define 


w(n) = S- 1 and Q(n)= S- 1: 


p prime p prime, a>1 
pin p*|n 


w(n) counts the number of distinct prime factors of n, while Q(n) counts the number 
of distinct prime powers that divide n, so that w(12) = 2 and N(12) = 3. On average 


13.2. The number of prime factors of a typical integer 489 


the difference between these two is 


I 1 l . 
to 
n<a p prime, a>2 p prime n<a p prime 
p°|n a>2 p*|n a>2 
< S- 1 S- Ls . 1 ‘ 
- p prime, a>2 pe p prime p(p _ 1) _ n>2 n(n = 1) 


so we can work with either, as is convenient. It will make little difference. 


The average number of distinct prime factors of an integer < x can be obtained 
from calculating 


Yem=5 Hi= y Y= YE). 


n<u n<x p prime p prime n<ax p prime 
pin pin 
Hence the average is approximately 


os 


p prime p prime 
psu psu 


= loglog x 


Ble 


by @.12.4). 


Exercise 13.2.1. (a) Prove that 


PS& 


(b) Deduce that 


1 
lim |— w(n) — y =|=0 
aco n<u p prime Pp 
PSx 


Calculations suggest that most integers have very few prime factors, but some- 
times one gets limited information from calculations, especially for a function that 
we expect to grow like logloga. As Dan Shanks wrote: “Numbers at infinity are 
quite different from those that we see down here: the average number of their prime 
divisors increases like loglogx and, while that increases very slowly, it increases 
without bound.” 


We are going to go one step further and ask how much w(n) varies from its 
mean; that is, we are going to compute the statistical quantity, the variance, using 
the following standard identity: 


Exercise 13.2.2. Show that if a1,...,ay have mean m, then 


we (am = my? = So a? — m2. 


n<N n<N 


490 13. The anatomy of integers 


Therefore the variance of w(n) from its mean, over the integers n < 2, is 


2 2 


(13.2.1) ~S> ( w(n) - = So w(m) = = SY o(n)? - = S w(m) 


nsx mca na m<a 


The first term here is 


= ee Le le 


n<u n<axzp prime q prime ea 


pin qin q 
2 


1 1 1 1 

< ¥o -+)5 YS. =< -+| So = 

p prime P p prime q prime, q4#p ee p prime P p prime - 
pge pqse psu psu 


2 
The second term is roughly (2 aes w(m)) , and so the variance is bounded by 
the first term. Making this argument more precise one can show that 


1 
(13.2.2) — S-(w(n) —loglog x)? < 2loglogz. 
x 


n<x 


Exercise 13.2.3. (“Almost all” integers n have about loglogn prime factors.) Show that 
implies that there are < 2x/(loglog)!/3 integers n < a for which |w(n) — logloga| > 
(loglog z)?/3. In other words, we have |w(n) — logloga| < (loglog«)?/ for all but at most 
< 2x/(loglog x)!/3 integers n < x. This is a famous result of Hardy and Ramanujan (we will 
develop their proof in section[13.4). 


In section [4.15] we saw that although the value of t(n) bounces around a lot, 


the mean value looks like logn + 27 — 1, with a very small error term. By exercise 
we have 


getn) <r(n)< gin), 


and we have just shown that w(n) and ((n) are close to loglogn for almost all 
integers n in exercise [13.2.3] This implies that 


T(n) is roughly 2'°8!°" = (log n)!°8? 


for almost all integers n. 


Since log2 = 0.693..., this seems wildly inconsistent compared to what we know 
about the mean value of r(n). 


Exercise 13.2.4. Explain, by creating a simpler but analogous example, how it is possible that 
7(n) can usually take values around (logn)!°S?, but averages about logn. (You might think of 
100 students taking an exam in which most do poorly, but one does well.) 


13.3. The multiplication table problem A491 


13.3. The multiplication table problem 


When you were young, you probably had to learn the multiplication table by heart, 
with perhaps all the values of a x b for 1 < a,b < 12 written in a grid like 


x | 1|2)3)4)5)6) 7) 8 9/10; 11 | 12 
1} 1);2);3/4)]5]6/ 7) 8 9} 10) 11) 12 
2/2); 4] 6/ 8/10 |12 |14 |16 | 18 | 20 | 22 | 24 
3 | 3) 6} 9 112 }15 )18 |21 |24 ) 27 | 30 | 33 | 36 
4) 4} 8 }12 |16 | 20 | 24 |28 |32 | 36 | 40 | 44 | 48 
5 | 5 |10 ]15 | 20 | 25 |30 |35 |40 | 45 | 50 | 55 | 60 
6 | 6 | 12 |18 | 24 |30 |36 |42 |48 | 54) 60 | 66 | 72 
7 | 7 | 14 ]21 | 28 )35 |42 |49 |/56 | 63 | 70 | 77 | 84 
8 | 8 | 16 | 24 |32 |40 |48 |56 |64 |] 72 | 80 | 88 | 96 
9 | 9 |18 |27 |36 | 45 |54 |63 | 72 |) 81 | 90 | 99 | 108 
10 | 10 | 20 |30 |40 |50 |60 | 70 | 80 | 90 |100 | 110 | 120 
11 {11 | 22 |33 |44 155 |66 | 77 | 88 | 99 | 110 | 121 | 132 
12 | 12 | 24 |36 |48 |60 | 72 |84 | 96 | 108 | 120 | 132 | 144 


If you were Paul Erd6és, you might have quickly become bored waiting for the others 
to learn it and asked yourself other questions. For example, how many different 
integers are there in the table? There is the obvious symmetry down the diagonal 
(as ax b = bxa), meaning that one only need look at the upper triangle for distinct 
entries. One spots other coincidences, like 3 x 4 = 2 x 6 and 4x 5 = 2 x 10, and 
wonders how many there are. In other words, what percentage of the integers up 
to N? appear in the N-by-N multiplication table? More precisely define 


1 
p(N) := yon < N? : there exist a,b < N for which ab = n}. 


For N = 6 there are 18 distinct entries; that is, p(6) = 1/2. For N = 10 there are 42 
distinct entries, so that p(10) = .42. For N = 12 we have 59, so p(12) = .41. Then 
p(25) = .36, p(50) = .32, p(75) = .306, p(100) = .291, p(250) = .270, p(500) = 
.259, p(1000) + .248. Do these proportions tend to a limit and, if so, can one guess 
what the limit is? Erd6és proved that the limit exists and equals 0, and his proof is 
beautifulf] 


The idea is simply that almost all integers up to N have about log log N prime 
factors and so the product of two such integers has about 2loglog N prime factors. 
However almost all integers up to N? have about loglog N? = loglog N + log2 
prime factors and so cannot be the product of two typical integers < N. 


Exercise 13.3.1. Give a more formal version of Erdés’s proof. 


lTrue mathematicians are motivated by elegant proofs, none more so than the great Paul Erdés. 
He used to say that “the supreme being” keeps a book which contains all of the most beautiful proofs 
of each theorem. We mortals are only occasionally allowed to glimpse this book, when we discover an 
extraordinary proof. Erdés’s proof of the multiplication table theorem is truly from the book (and see 
for more examples). 


492 13. The anatomy of integers 


13.4. Hardy and Ramanujan’s inequality 


Hardy and Ramanujan proved the general upper bound: For all k > 1 we have 

x (logloga +c,)*-! 

log x (k —1)! 

for certain constants cp and c;. This is not difficult to prove by induction on k > 1. 
For k = 1 this follows from Chebyshev’s upper bound on (2) given in (5.5.]). 


For larger k, suppose that n = pi ---p* <a with py! <--- <p,*. fj <k-1, 
FI ek 


then (p57)? <p,’ py* <n < x and so Dp; < Vx. Moreover m; = n/p,’ satisfies 


(13.4.1) m(a,k):= #{n<a2: w(n) =k} <c 


mj; < «/p;’ and w(m;) = k — 1. Therefore 


iene Vaceene Y 4-2 eae 


= 0 é@ = ! ’ 
pose pesVa sea ea) 
by the induction hypothesis. We now use the bound log log(a/p*) < log log x, and 
1 log log x + cy 
13.4.2 < 
\ » po log(x/p°) ~ — loga 


peSVve 
which follows (after some work) from (5.12.4) with c; sufficiently large, to obtain 
the desired bound (13.4.1). 


Exercise 13.4.1.1 Give another proof that “almost all” integers n have about loglogn prime 


factors using (13.4.1). 


We noted earlier that w(n) and Q(n) are typically close together, so we might 
expect that m(a,k) and N(a,k) := #{n <a: O(n) = k} are close. This is true 
when & < (1+ .€)loglog az. For larger k we now give a lower bound on N(a,k) that 
is far larger than the upper bound (13.4.1) on (2, k): 


Consider the set of integers n < x of the form 2*~'m where Q(m) = ¢ and 
m <x/2*-*. Therefore O(n) = k — £4+ O(m) = k and so 


N(a,k) > N(x/2*-*, 2). 


We noted earlier (see exercise [13.1.1) that the largest values of (X,@) and N(X, @) 
occur when £ is close to loglog X in which case there exists a constant cz > 0 such 
that. 1(X,£), N(X, 0) > c3X/JloglogX if X = «/2*~* is sufficiently large. Now 
suppose that /loga > k > @, so that if e > 0, then x > X = 2/2*-* > 2!~* once x 
is sufficiently large. Therefore, we deduce that 

C3X x i 
(13.4.3) N(a,k) > aw ors oe > C4 5k (log x)'°2?, 


for some constants c3,c4 > 0, when @ > log log x + log log log x. 


Exercise 13.4.2.4 Let k = [Alogloga]. Use (13.4.1) together with exercise d) to give an 
upper bound on 7(2,k), and then use (13.4.3) to give a lower bound on N(az,k). Deduce that 
once A satisfies 1 + A(log A — 1) > (A— 1) log 2, then N(a,k) > 27(2,k) for x sufficiently large. 


Appendix 13A. Other 
anatomies 


There are features of the anatomies of certain other mathematical objects, when 
broken up into their indecomposable components, that are very similar to the in- 
tegers. We explore that briefly here; for more information, presented in a rather 
different format, see the graphic novel [2]. 


13.5. The anatomy of polynomials in finite fields 


Monic polynomials (over C or in F,,) can be factored in a unique way (up to order) 
into monic irreducible polynomials. There are p” monic polynomials of degree n 
in F,, and in (4.12.3) of appendix 4C, we showed the number of monic irreducible 
polynomials of degree n in F, is 5 odie u(d)p"/4. This is close to p"/n; that is, 
roughly 1 out of every n polynomials of degree n is irreducible. We “calibrate” 
this with the proportion 1/log x of integers up to x that are prime to compare the 
anatomies of polynomials in finite fields with those of integers. 


In appendix 4D we showed that, on average, integers < x have about logz 
divisors. For the analogous result, note that if a given monic polynomial f(x) of 
degree m divides a monic polynomial of degree n, then it can be written as f(x)g() 
for some monic polynomial g(x) of degree n — m. Therefore, the average number 
of monic polynomials dividing a monic polynomial of degree n is 


n n 
1 1 
in, 2s oe, Boge Oe 2 
pn pn 

h(@) monic m=0 f(x) monic m=O f(x) monic g(a) monic 

of degree n of degree m of degree m of degree n—m 


f(a) divides h(x) 


n 


1 
Sa Pe eee 


m=0 


which was calibrated with log x + 1. 


493 


494 Appendix 13A. Other anatomies 


The number of monic polynomials of degree n with exactly two irreducible 
monic polynomial factors is 


n—-1 
1 
(pe rr ee 
d=1 f(a) monic irreducible g(x) monic irreducible 
of degree d of degree n—d 


i) 


less the cases where f = g. Now the formula for the number of monic irreducibles is 
complicated so let’s just work with main terms, so we see that the above is roughly 


1 n—-1 pt pr-d ae n—-1 =" n— mal =< 

; log n. 
24d n-d > 2 d(n ear do abe 
This can be \compared with the Pee: of integers < x on exactly two prime 
factors, © eae ; log log x, which gives the same proportion of the total number, re- 
placing n by os z. A similar argument yields that the number of monic polynomials 
of degree n in F, with exactly k irreducible monic polynomial factors is roughly 


(log n)*—* 
nm (k-Iy! ? 


Let w(F’) denote the number of distinct monic irreducible factors of F’. The 
mean value of w(F’), over monic F of degree n, is 


-- © Ye LY wey LY ae 


at least if k is not too large (in terms of 7). 


n n 
P F(x) monic m=1 g(x) monic irreducible P m=1 g(x) monic irreducible F(a) monic 
of degree n of degree m of degree m of degree n 
g divides F g divides F 
Le “1 
=) ae Sane ee D3 wns ~ logn, 
pr pm 
m=1 g(x) monic irreducible m=1 d\|m 


of degree m 
taking only the d = 1 terms. 


One can then prove, in one of several ways (analogous to how we approached 
the prime factors of integers), that the variance is also about logn, and so almost 
all polynomials of degree n in F, have about log n distinct monic irreducible factors. 


Exercise 13.5.1. Sketch a proof that almost all polynomials in Fp of degree 2d are not the 
product of two polynomials of degree d, as d gets large. 


13.6. The anatomy of permutations 


Permutations can be represented as a product of cycles in a unique way, and a given 
set of cycles defines a permutation. A cycle is an irreducible permutation. Let Sy 
be the set of permutations on N letters. The number of permutations on N letters 
is |Sy| = N!. There are (N — 1)! cycles on N letters, since the first letter can be 
sent to any of the other N — 1 letters, that letter to any of the N — 2 remaining 
letters, etc. The cycles form a proportion (N — 1)!/|Sw| = (N — 1)!/N! = 1/N 
of all the permutations in Sy. We “calibrate” this with the proportion 1/log x of 
integers up to x that are prime to compare the anatomies of permutations with 
those of integers. 


If n = p,--- px, the factorization of squarefree n into primes, then each divisor 
can be written as pj(1) +--+ p;(e) for some {j(1),..., 7(@)} of {1,..., k} (and each such 


13.6. The anatomy of permutations 495 


product gives a divisor of n). In this language, the analogy for permutations would 
therefore be: If o = C,---Cy, the factorization of o into cycles, then each divisor 
can be written as C1) --- Cj) for some subset {j(1),...,7(¢)} of {1,...,k}. This 
set of cycles acts on some subset S of the N letters, permuting the elements of S 
(and of the complementary set, T). Thus the divisors of o € Sy are the subsets S 
of the N letters that are fixed by o. If o is a cycle, then the only subsets it fixes are 
0 and itself, very much in analogy with how we define primes. The average number 


of divisors of a permutation of a set A of N letters is 


ee SF 


o€SN SUT 
o fixes sé na Ty 


If |S| = k&, then there are k! - (N — k)! permutations of S and T, and so this is the 
number of a € A which fix S and T. Therefore the above equals 


oy ee (Nee) -m (i) (Wm Soa 


k=0 SCA 
[Sl=k 


The number of permutations with exactly two cycles is 


N-1 N-1 
=5 6 I(N-k-1)) >> 1=5 0 1)\(N —k 0) 


k=1 SUT=A k=1 
[stk 
N-1 N-1 
N! 1 NM! 1 N! MN! 
= = | N. 
2 4 KNB) aN G+ z)= yLES n° 


A similar argument yields that the number of permutations with exactly k cycles 
is 


N! 1 3 1 _ N! (log N)*7! 

N (k—1)! Q1:dn-1 N  (k—-1)!’ 
Q1,...,4~-12>1 

ayt+ap-15N-1 


at least if k is not too large (in terms of N), as we prove in the following exercise: 


Exercise 13.6.1. (a) Prove that 


m m 
1 1 1 
So), 3 eee es) 
( = 2) AL yey am2>l Gy ak =1 (=:2) 
apes 


(b)? Prove that if m < donee? then the two terms at either end of the inequalities in (a) 


differ by a multiplicative factor which gets arbitrarily close to 1 as A grows. 


We will now determine the average number of cycles in a permutation. First 
note that the number of permutations containing a given cycle C of length k is 
(N — k)!, since one determines all the ways that o can act on ae letters not acted 


on by C’. The number of cycles of length k is ee) (k-1)!= CN RYE: Therefore the 


496 Appendix 13A. Other anatomies 


average number of cycles per permutation of Sy is 


De Sis yy IE 


* o€Sy C acycle * o€Sy k=1C acycle : k=1|Cl|=ko€Sn 
Céo |C|=k CEo 
CEo 
N 
1 N! 1 
a -(N—k)l = y — & log N. 
mi 2(w—min | ) Paes: 
k=1 k=1 
To determine the variance, we calculate 
2 
1 
ee ee > Liebe Yt 
N! =i ! 
o€Sn \C acycle o€Sn C acycle o€Sn CUD disjoint cycles 
Céo Cea CUDEo 


We just calculated the first term. For the second we note that given CU D with 
|C| =k, |D| = @, the number of a € Sy with CUD € o in (N—(k+¢))!. The number 
of pairs of disjoint cycles CU D with |C| = k, |D| = @ is CTE repel ce) ae 


(N—k—£)1e 
Ese Therefore the second term in the last displayed equation equals 
1 N! 
— -(N —(k+4+2)) 
N! 2 (N—k— £)! ke ( (ec ees - 
k,e>1 ke>1 
k+l<N k+L<N 


Therefore the variance equals 


aA 1 Re ed 
— —— = — = log N. 
r+ go (oE) <beem 
k=1 k€>1 k=1 k=1 

k+l<N 
We deduce that almost all permutations on N letters have about log N cycles. 


Exercise 13.6.2. Prove, by taking m =k + @, that 


Ma 2 1 N 4 N+k 

<(S2)- d= F ge iy peyi yd 

k=1 k,e>1 1<k,l<N m=N+1" k=m_—N in Ne 
kte<N E+eSN 


There are many other aspects of the anatomies of polynomials in finite fields, 
and of permutations, that mirror the anatomy of integers. 


More on mathematical anatomies 
[1] Richard Arratia, A. D. Barbour, and Simon Tavaré, Random combinatorial structures and prime 
factorizations, Notices Amer. Math. Soc. 44 (1997), 903-910. 


[2] Andrew Granville, Jennifer Granville, and Robert J. Lewis, Prime suspects: The anatomy of integers 
and permutations, Princeton University Press, 2019. 


[3] Anatoly M. Vershik, Asymptotic combinatorics and algebraic analysis, Proc ICM Zurich (1994), 
1384-1394. 


Appendix 13B. Dirichlet 
L-functions 


In section [8-17] of appendix 8D we noted that the infinite series 


for the Dirichlet LZ-function L(s,x) is absolutely convergent whenever Re(s) > 1. 
However, in the subsequent sketch of Dirichlet’s proof that there are infinitely many 
primes in arithmetic progressions a (mod q) with (a,q) = 1, it was necessary to 
establish that every L(s,) with y non-principal can be analytically continued into 
a region that contains s = 1. We will prove this here. 


13.7. Dirichlet series 


To determine the value of L(1, x), we sum its series in order, taking q terms at a 
time: 


kat kq+ 
eee aw 
poe n cane kq+1 kqt+1 n 


Summing up the first term here gives gq Sue x(n) = 0 as we saw in (8.16.2). 
We bound the second term by noting that | — x(n)| < 1, so that 


ge al (ght eaghg esta 
n=kq+1 7 get kqg+1 uy kq+1 kqg+q+1 


497 


498 Appendix 13B. Dirichlet L-functions 


Therefore, applying this for k = 1,2,3,... we obtain 


x(n)| 1 x(n) 
ee 
n>1 n=1 k>1 |n=kq+1 
< (logqg+1)+ > ( : : <logq+2 
O 
ee “ kqt+1 kqtq+1 ve 


k>1 
since the last sum is telescoping. We have proved that if one sums the series q 
terms at a time, then the new sum is absolutely convergent. 
We deduce from Dirichlet’s class number formula (section [12.15] of appendix 
12D) that if d > 0, then 
log eg < h(d) log eg = Vd L(1, (d/-)) < Vd(log d + 2), 


as h(d) > 1, and so eg < (8d)¥“, which is of a similar order of magnitude to what 
we observed from the data in section [1.13]in appendix 11B. 


More general s. Taking g terms at a time we can also give a valid definition of 
L(s, x) for any s € C with Re(s) > 0 and y non-principal, and we can justify this 
by much the same argument. Therefore we define 


(13.7.1) L(s,x) = S- Ax(s,x), where A; (s, x) = 3 xhg +3) 
k>0 j=l 


Since y(kq +7) = x(j) for each j, we have 


Ax(s, x) “> (24. + x(3) (<> ce )) , 


The first terms sum to 0. For the second term, 


1 1 - dt 
z = s ‘ 
(kq+j)s  (kq+1)8 1 (kq+t)stt 


Taking absolute values, where o =Re(s) > 0, this is 


< | if dt |s| ( 1 1 ) 
Ss = 
~ dy (ka + tet! oo \(ke t+ 1)e (kg +7)? 


since |n*| = n?. Substituting this bound in above and noting that each |y(j)| < 1, 
we have 


4 || 1 1 als| 1 : 
JAw(s201 SD = (aor oF) < a (aie ceresial 


Summing this over each k > 0 we deduce that 
qls| 1 1 qls| 
Ad(s.x)1 < 2 ( = Hel 
y o y (kq+1)" (ka+q+4+1)? o 


Therefore the new sum in (13.7.1) defining L(s, y) is absolutely convergent for any s 
with Re(s) > 0, meaning that we can unambiguously decide on its value throughout 


13.7. Dirichlet series 499 


this domain? We have analytically continued these L-functions; to do so for all 
s € C is beyond the scope of this book (but see ). 


Exercise 13.7.1. Let ¢ =Re(s) > 0. Prove that 


sot ifo > 1, 
|L(s,x)| < 4 loga+|s| +1 ifo=1, 
ie ae GG isl) if0<o0<1. 


In section [8.17] of appendix 8D we sketched Dirichlet’s proof that there are 
infinitely many primes in arithmetic progressions. One begins with the (absolutely 
convergent) Euler product representation 


L(s,x) = II — 


8 
p prime P 


for Re(s) > 1 (see section [4.9] of appendix 4B), and takes the logarithm to obtain 


log L(s, x) = ey 


p prime k>1 
pcx 


We have seen that |L(1,x)| is bounded. If, also, L(1,.) 4 0, then we know that 
log L(1, x) takes a bounded value, and so we can let s > 1* above. The terms with 
k > 2 contribute < So os5 isa is = Vase maa = lL. Therefore, we deduce that 


a x(P) L adiatien as ~Z—> co 
p prime 
pox 
(as in (8.17.2)), and from there Dirichlet’s proof is given in section of appendix 
8D. It remains to prove that L(1,) 4 0 for every non-principal Dirchlet character 
x (mod q). 


Exercise 13.7.2. (a) Prove that if o > 1, then 


1 1 
Fo eH = Se 
q x (mod q) pk=1 (mod q) P 
(b) Deduce that [], (wiod g) L(s, x) is non-zero at s = 1. 
c) Prove that if L(1,x) =0, then L(1,xX) = 0. 
(d)? Deduce that if L(1,x) = 0, then x is real. 


The last exercise leaves us only needing to prove that L(1,(d/-)) 4 0. This 
task presented an enormous challenge to Dirichlet but culminated in his brilliant 
class number formulas (see section [12.15] of appendix 12D), which established that 
L(1, (d/-)) 4 0 as each class number h(d) > 1 (since one always has the principal 
form of fundamental discriminant d). 


? What is remarkable is that no matter how we rewrite L(s,x), the new expression takes the same 
value in the wider domain as any other, provided the new definition converges absolutely in that domain. 


500 Appendix 13B. Dirichlet L-functions 


Further exercises 


Exercise 13.7.3. (a) Show that for any integer a,1 < a < 9, there are ~ 2/9 integers < x 

whose leading digit is a, where x = 10” and integer n > oo. 

(b) Show that there are ~ 52/9 integers < a whose leading digit is 1, where = 2-10” and 
integer n — co. 

(c) What can we say about the density of integers whose leading digit is 1? 

(d) The logarithmic density of a set S of positive integers up to z is given by mes unde, WES -. 
For any given integer a, 1 <a< 9, let Sq be the set of integers with leading coefficient a. 
Prove that the logarithmic density of S,, namely 


1 1 log(1+ 1 
— exists and equals lpg aia) 
log 10 


nm 
n<az, n€Sa 


lim 


xL—0O log x 


Exercise 13.7.4. Show that Ti, nol = 2 (1 t xa): 
Exercise 13.7.5.! (a) Use Theorem [5.3] to establish that there exist constants 0 < ci < co 
such that if « > 2, then 


lo 
ca< x ae <2. 
a<pi3sx 
(b) Deduce that there exist constants 0 < c3 < c4 such that if « > 6, then 


1 
c3 logxa < ~ aoe < ca log. 
psx 


(c) In section [13.1]we claimed that 7 ,< yz — is well-approximated by ieee D p<va . 
p prime p prime 
Show that there exists a constant c5 > 0 such that the difference between these two expres- 


sions is < c52. 


(d) Prove (13.4.2). 


Exercise 13.7.6. (a) Show that every integer n can be written as mr where m is powerful, r 
is squarefree, and (m,r) = 1; and deduce that Q(n) — w(n) = Q(m) — w(m). 
(b) Prove that there are < x/m integers n < x of the form mr as in (a). 
(c) Prove that if Q(m) — w(m) > k, then m > 2*+1. Deduce that 


Z#{n<2:An)—win)>e}<S SD. 


m powerful 
m>2kt1 


(d) Prove that every powerful number m can be written as a?b? for some integers a and b. 
e educe that if a 2 , then a > or b> , and therefore that 
Deduce that if a?b3 > 2*+1, then a > 2*/4 or b > 2*/®, and therefore th 


1 Ee ee a 2. 5 
x a2b3 = 2 a2 De b3 a2 » 3 = Qk/4° 
a,b>1 a>2k/4 b>1 a>1 b>2k/6 


(f) Deduce that there are < 2/2° integers n < x for which Q(n) — w(n) > 4€+3. 


Exercise 13.7.7. (a) Use (13.4.1) to show that 
g 2 (2(log log # + c1))*~? 


#{a,b <a: w(a)+u(b) =k} <oe 


° (log x)? (k — 2)! 
(b) Write K := [Pee] and let 6 = 1— stele? = .086071.... Use Stirling’s formula to 
prove that there exists a constant cg such that 
2 
#{a,b <a: (a) +w(b) < K} + #{n <0? :w(n) > K} Sea. 
(log2)? 


(c) Use this result to more or less justify the claim that there are < cgN?/(log N)° distinct 
integers in the N-by-N multiplication table. 


TR ggg 
Chapter 14 


Counting integral and rational 
points on curves, modulo p 


A planar curve is defined to be an equation f(a, y) = 0 in two variables, x and y. In 
this chapter we are interested in identifying the number of solutions u,v (mod p) 
to f(u,v) =0 (mod p). 


For linear equations, this is easy. If p does not not divide a, say, then 


#{(e,y) (mod p): ar +by=c (mod p)} =p, 


since we have a unique value of « (mod p), namely (c — by)/a (mod p), for any 
given y (mod p). 


14.1. Diagonal quadratics 


We can take the same approach to the congruence ax? + by? = c (mod p): The 
number of x (mod p) that satisfy this equation for a given value of y (mod p) is 
2 
1+ (ls) by Corollary [8-1-1] and if we can sum these up, over all y, then we 
have our answer. This seems like a daunting task so let’s take a different approach. 
We saw in exercise[6.1.1] that the rational points (u,v) on the curve u?+v? = 1 
are in 1-to-2 correspondence with the integer solutions to the Pythagorean equation 
a? +y? = z? with (z,y) = 1 (taking u = 2/z and v = y/z), so it seems natural 
that there should be some relationship between the number of solutions to either 
equation mod pL 


Theorem 14.1. If p is a prime that does not divide integers a,b,c, then 


2 2 on 2 
N :=#{(x,y,2) (mod p): ax” +by* +cz° =0 (mod p)} =p". 

1 However a note of caution: If we take the same approach to identify rational points on the curve 
u2 — v2 = 1 with coprime integral solutions to «? — y? = z?, then we have to be more careful when 
z = 0 (which did not occur in the first example). 

| 


501 


502 14. Counting integral and rational points on curves, modulo p 


Proof. We look at the solutions to u+u+w = 0 (mod p) and then count the 
number of (x,y,z) (mod p) with u = ax”, v = by’, and w = cz? (mod p). The 


ult) = 1+ (2)(4). Therefore 


wm, Ot) GG) OG) GG) OG) GG) 


u,v,w (mod p 
u+v+w=0 (mod p) 


number of x with 2? = u/a (mod p) is 1 + ( 


We now multiply out the brackets and end up with eight terms. We claim that all 
but the first and last obviously equal 0: To see this suppose that our term includes 
(>), but not for (+). Then the condition “u+v+w = 0 (mod p)” is redundant, 
since we take w = —u — v (mod p) and its value does not affect the sum, and so 
we sum over all possible wu and v (mod p). In particular )7, (oa a. = 0. 


There are p* terms in the sum for N (since we can pick u and v at will, and 
then w = —u—v (mod p)), and so 


vre(®) 5 (2) 


u,v,w (mod p) 
utv+w=0 (mod p) 


We will quantify the solutions in terms of u. If u = 0, then the summand 
is 0, so we may assume that u # 0 (mod p), and then we can write r = v/u 
and s = w/u (mod p). Therefore (4) = (SEES) = (S)(F) andut+ut+tws 
u(1+r-+s) (mod p). Therefore 


3 (=) a (<) . (=) = 
u,v,w (mod p) - u (mod p) P rs (mod p) P 
u+v+w=0 (mod p) 1+r+s=0 (mod p) 


as the first sum is 0. The result follows. 


Corollary 14.1.1. If p is a prime that does not divide integers a, b, c, then 


A(c) := #{(u,v) (mod p): au? + bv? + ¢ = 0 (mod p)} = p— (=) , 


Proof. We partition the solutions counted by N in the proof of Theorem|14.1)into 
the solutions in which z = 0 (mod p), and those with z 4 0 (mod p). There are 
exactly A(0) with z = 0 (mod p) (taking « = u and y = v). Otherwise we can 
divide through by z (mod p) to obtain a solution (a/z,y/z) (mod p) in A(c); and 
if we have a solution (u,v) (mod p) in A(c), we obtain p — 1 solutions (uz, vz, z) 
counted in N. Therefore 


N = A(0) + (p—1)A(o). 


Now for the case c = 0. We always have the solution u = v = 0 (mod p). Other- 
wise, given any non-zero solution (u,v) (mod p), we may divide through by v to 
get a solution to w? = —b/a (mod p) where w = u/v (mod p). From any such 


solution w (mod p) we can recover all non-zero solutions (u,v) = (vw,v) (mod p) 


14.2. Counting solutions and a proof of quadratic reciprocity 503 


to au? +bv? = 0 (mod p), as v runs through the reduced residues mod p. Therefore 


A(O) = LH(p—1}4#{w (mod p) + u? = -b/a (anod p)} = 14(p-1) (14+ (=), 


and the result follows. 


One pertinent example is that 


#{(x,y) (mod p): 2? — dy’ =1 (mod p)} = p— (;) 


Exercise 14.1.1. Prove that if odd prime p does not divide n, then 


SA) ee 


(mod p) SP y (mod p) e 


Exercise 14.1.2. Suppose that odd prime p does not divide n. 
(a) Show that there are +(p -3- Ha + (=))) residues m (mod p) for which m and m+n 
are both quadratic residues mod p. 
(b) Show that there are +(p -—3+ (Ha + (=))) residues £ (mod p) for which £ and +n are 
both quadratic non-residues mod p. 


14.2. Counting solutions to a quadratic equation and another proof of 
quadratic reciprocity 


We now present a proof due to Wouter Castryck, based on Lebesgue’s 1838 proof. 


Given an odd prime q, we define N,, for any odd integer n to be the number of 


solutions (r1,...,2%n) (mod q) to the congruence 
x} — 23 +03 —---+22=1 (mod q). 


We first prove, by induction, that 


Nn =@q? + q? for all odd integers n > 1. 


For n = 1 this follows as there are two square roots of 1 (mod q). Otherwise we 
make the invertible change of variables X; = x1 — xg and X; = @; for all 7 > 2, to 
obtain the congruence 


2X,Xo+ X?4XZ—---+X?2=1 (mod q), 


with the same number of solutions. Now if we select any X3,...,X, (mod q) 
and any X; # 0 (mod q), then there is a unique choice of X2 (mod gq), a total 
of (¢ — 1)qg”~? solutions. Otherwise if X; = 0 (mod q), then we can choose any 
X» (mod q), and X3,...,Xn (mod q) satisfy X? —---+ X? =1 (mod q); this has 
qNn—2 solutions. Therefore we have proved that 


Nn = (q- 1g"? + GNn-2 
and then the claimed result follows by the induction hypothesis. 


We now determine N,, for odd prime p, in a rather different way. If we let 
x7 =t, (mod gq), x3 = —t2 (mod q), x? = t3 (mod q),..., then we have 


oo Eiualt OCD OO) 


ty) +t2+---+t,=1 (mod q) 


504 14. Counting integral and rational points on curves, modulo p 


If we multiply this out and sum each of the 2? terms, we find that all but the first 
and last terms trivially sum to 0 (just as in the proof of Theorem [14.1). Therefore 


-1 
ne Hifo- +t 
Np — io + (=) S- (Ae a t) : 


ty+---+tp=1 (mod q) 


Comparing our two evaluations of N, we obtain 


x, (2)-(@)9" 


t1+---+tp=1 (mod q) 


We will now approach this sum, mod p, using the same idea as in the necklace 
proof of Fermat’s Little Theorem (see section of appendix 7A). The idea is 
to group together the solutions (t1, t2,..., tp), (t2,t3,-..,tp,t1),... which each con- 
tribute the same amount to our sum. These are p distinct solutions unless each 
t; =t, =t, say, and so 


(2) E. ()-C)=@) mn 


tit--+tp=1 (mod q) pt=1 (mod q) 

Comparing the two expressions for the sum and recalling that (+) = (-1) =, 
and that q’= = (4) (mod p) by Euler’s criterion, we obtain the law of quadratic 
reciprocity. 


14.3. Cubic equations modulo p 


Given our results for linear and quadratic equations, we might guess that if p { 6abc, 
then N(a,b,c) := #{(x,y,z) (mod p): ax? + by? + cz* = 0 (mod p)} = p?. We 
can test this guess with p = 7. A calculation reveals that N(a, b,c) equals one of 
19, 55, or 73 for every a, b, c, so our guess is wrong, and there is something to be 
understood. 


Exercise 14.3.1. Let p be a prime = 2 (mod 3). 
(a) Prove that for every a (mod p) there is exactly one b (mod p) for which b? = a (mod p). 
(b) Deduce that if p{ (a,b,c), then N(a,b,c) = p?. 


Henceforth we will assume that p = 1 (mod 3). We can proceed as in the proof 
of Theorem which leads us to Jacobi sums (which we will explore in sections 
and [14.10] of appendix 14A) though here we will take a different route. 


Exercise 14.3.2. Let p be a prime = 1 (mod 3) with a primitive root g, and suppose p { abc. 
(a) Show that if p{ rst, then N(ar°, bs?, ct?) = N(a, b,c). 
) Show that N(1,1,1) + N(1,1,g) + N(1,1, 9?) = p?. 

(c) Show that N(1,9,1) + N(1,9,9) + N(1, 9,97) =p?. 
) 
) 


Deduce that N(1, 9,97) = N(1,1,1). 
Show that N(a, b,c) = N(1,1, abc). 


Therefore we can reduce our question to counting the solutions (a, y, z) to 


x+y? +dz*=0 (mod p). 


14.4. The equation E,: y? =2°+b 505 


We will transform this problem to counting solutions on a curve: If «+ y 4 0 
(mod p), then let wu = (a + y)/72d £0 (mod p), and v = 36d —y/u, w = —z/6u, 
to obtain 


v? = w® — 432d? (mod p). 


Given a solution (v, w) we can reverse this process and create p— 1 solutions to the 
original equation. If « + y = 0 (mod p), then z = 0 (mod p), and so there are p 
such solutions. Therefore we have proved that 


(14.3.1) N(1,1,d)=p+(p—1)#{a2,y (mod p): y? = a2? —3(12d)? (mod p)}. 


The problem of determining N(a,b,c) has now been reduced to special cases of 
determining the number of mod p-points satisfying the following equation: 


14.4. The equation E,: y? = 2° +b 


Let p be a prime = 1 (mod 3). By the theory of binary quadratic forms (see 
chapters 9 and 12), we know there are unique integers a and b, up to sign, for 
which p = a? + 302. We select the sign of a so that a = 2 (mod 3) 


Theorem 14.2. Let p be a prime =1 (mod 3) with a primitive root g, and define 
a and b as above. If p does not divide €, then 


N,():=#{(m,n) (mod p):m?=n?+£ (mod p)} 
2a if € is a cube mod p, 


id 
=pt+ (<) u where u:= § —a+3b_ if ¢/g is a cube mod p, 
- —a—3b_ if ¢/g? is a cube mod p. 


There are three possible ways to write 4p in the form u? + 3v? with u=1 (mod 3): 
4p = (2a)? + 3(2a)? = (—a + 3b)? + 3(a +b)? = (—a — 3b)? + 3(a — b)?. 


In each case we have |u| < 2,/p and so |N,(k) — p| < 2,/p for every non-zero k. 


Proof. We have 


N= Se {ut (“)} = p+ Se where 5; => (“*). 


n (mod p n=1 


Exercise 14.4.1. Let p be a prime = 1 (mod 3) with a primitive root g and suppose (£,p) = 1. 
(a) Prove that if L = ér? (mod p), then Sz = (¢) So. 
Define Ty := (4) Sy, so that T, = Ty. 
(b) Prove that if = g* (mod p) and i is the least residue of k (mod 3), then Ty = T,i: 
(c) Prove that Ti is even, whereas T, and T,2 are odd. 
(d) Prove that each Ty = 1 (mod 3). 


Therefore there exist integers A,B,C such that T; = 2A,T, = —A+3B, and T2 = —-A-3C. 


Tt is difficult to determine the sign of b in Theorem [14.2] by the elementary methods employed 
here. 


506 14. Counting integral and rational points on curves, modulo p 


We will sum up the TJ? in two different ways: By exercise [14.4.1{b) we have 
S- [nae Pat n +, +T,2) = (p—1)(B-C). 
£ (mod p) 
By definition and then exercise [I4.1.1] we have 
e+n3 
Sey YS (SY )-e-9+ ES cyno 
(mod p) l<n<p & (mod p) P 1<n<p-1 


Together these equations imply that C = B. 


We will sum up the T? in two different ways: First, 
-—1 
YS) = = ((2A)? + (—A + 3B)? + (—A— 3B)) = 2(p— 1)(A? + 3B?). 
£ (mod p) 
Then, by definition and exercise [14.1.1] we have 


2. & s (Ome) 


£ (mod p) m,n (mod p) & (mod p) P 
£40 (mod p) 


EE EC) 


m,n (mod p) m,n (mod p) m,n (mod p) 
msn? (mod p) 


= p(1 + 3(p— 1)) — p? — 0 = 2p(p — 1). 


Together these equations imply that p = A? + 3B?. 


Therefore A = +a and B = +0 (as these are unique up to sign), and by exercise 
(i4.4.1{d) we have 2A =1 (mod 3); that is, A= 2 (mod 3) so that A =a. 


By inserting Theorem[14.2Jinto (14.3.1), and using exercise[I4.3.2(e) we deduce 
that 


2a if d/2 is a cube mod p, 
N(a, b,c) = p? + (p—1)u where u:= 4 —a+3b_ if dg/2 is a cube mod p, 
—a— 3b if dg?/2 is a cube mod p. 


In our example above with p = 7 = 2743-17, the three possible values for u are 4, 
—5, or 1, so that N(a, b,c) equals one of 73, 19, or 55, confirming our calculations. 


Exercise 14.4.2. (a) Prove that if p does not divide abc, then 


#{x,y (mod p): ax? +by? Sc (mod p)} = p— x(a/b) — x(b/a) +u 


where the character x (mod p) has order 3. 
(b) Deduce that #{w,2,y,z (mod p): 2° +y? = w3 + 23 (mod p)} = p? + 6p(p — 1). 


In general, it is of great interest to determine 


#{a,y (mod p): y2=a%+axr+b (mod p)}. 


14.5. The equation y? = x° + ax 507 


All such curves have the order 2 automorphism xz > x,y > —y We just succeeded 
in counting the solutions when a = 0. This is known to be a relatively simple case] 
because the curve has the “extra” order 3 automorphism, 7 > wa, y > y. Similarly 
when b = 0 the curve has the extra order 4 automorphism, « > —2z, y — iy, and 
we can also attack our problem by elementary means: 


14.5. The equation 4? = x? + ax 


Let N,(a) be the number of pairs (x,y) (mod p) such that y* = 2° + ax (mod p). 
We know that N,(a) = p+ Sq where 


EC) 


l<n<p 


Exercise 14.5.1. Let p be a prime = 3 (mod 4). Observing that (—n)° + a(—n) = —(n3 4+ an) 
deduce that S, = 0, so that Np(a) = p. 


Henceforth assume that p = 1 (mod 4). Calculations with a = —1 (that is, the 
curve y* = «? — x (mod p)) reveal that, for N, = N,(—1), we have 
Ns =7, Nig =7, Ni7 = 15, Nog = 39, Naz = 39, 
Neg SS Nig = 30, Nea S71, Wee 8 cc 


These are all odd and close to p, so we compute E, = E,(—1) : 
S_1/2: 


l| 
a 
| 
Sy 
ae 
i) 
l| 


Bs =1, Fy3 = —3, Fiz =—1, Bog =5, E37 = 1, 
Ey, = —5, E53 = —7, Fe. = 5, E73 = 3,.... 


These are all odd numbers ... any guesses as to what they are? Bearing in mind 
that we are only dealing with primes = 1 (mod 4), we might expect these odd 
numbers to reflect a property that primes = 1 (mod 4) have, but not primes = 3 
(mod 4). If you can predict |#,|, then try to predict the sign of E,. The reader 
will learn more from playing with the data for a while than peeking immediately 
at the following result: 


Let p be a prime = 1 (mod 4). By the theory of binary quadratic forms (see 
chapters 9 and 12), we know there are unique integers a and b, up to sign, for which 
p = a* + b?, where a is odd and b is even. We select the sign of a so that a = 3 
(mod 4). Since b? = p—1 (mod 8), we deduce that b/2 = "7+ (mod 2) 


Theorem 14.3. Let p be a prime =1 (mod 4) with primitive root g, and let a and 


b be as above. If (‘) = 1, then select r (mod p) for which r? = @ (mod p). Then 


N,(0) = p+2a (£) 


°>Though this extra automorphism doesn’t obviously help our elementary approach other than to 
render trivial the p= 2 mod 3 case. 

4It is difficult to determine the sign of b in Theorem [14.3] by the elementary methods employed 
here. 


508 14. Counting integral and rational points on curves, modulo p 


If (4) = —1, then select r (mod p) for which r? = ¢/g (mod p). Then 


N,(0) =p +2b (£) 


In each case we have |N,(¢) — p| < 2\/p for every non-zero 0. 


Proof. We will sum the $? in two different ways, using the following: 


Exercise 14.5.2. Let p be a prime = 1 (mod 4) with a primitive root g and suppose (£,p) = 1. 
(a) Prove that Sz is even, so there exist integers A, B such that S; = 2A and Sg = 2B. 


4 —n)3+e(— 3 * 
(b) Using that (! ie + ( ™)) = (2 ttn ) establish that Sz = 3 — (5) (mod 4). 
(c) Deduce that A is odd and B is even. 
(d) Prove that if c= r? (mod p), then Sc = (5) So. 


— 


) 
c) 
) 
) 


(e) Deduce that S_; = eye 2A. 


By exercise [45.21d) if (4) = 1, then S$? = S$? = (2A)?, and if (4) =i, 
then Sj = S* = (2B). Therefore 


S87 = PE = (2A)? + (2B)?) = 2p - 1)(4? + B?). 
£ (mod p) 
On the other hand, by definition and exercise we have 


E82, (EEE) 


£ (mod p) £ (mod p) m,n (mod p) 


=, (2), E (2) 


m,n (mod p) 


SE) age 


m,n (mod p) m,n (mod p) 


man? (mod p) 
The second sum is 0. In the first sum we get 0 when n = 0. Otherwise m = +n 
(mod p) so (2) = (+) = 1, and so the sum equals (p — 1) - 2. 


Together these two equations imply that A?+.B? = p. To complete the proof we 
need to determine the value of A (mod 4) (and therefore its sign). This is perhaps 
easiest via another transformation: If y? = n3 — n (mod p) with n # 0 (mod p), 
then n? —1= nz? (mod p) with y = nz (mod p). This map is invertible and, since 
y =0 (mod p) when n = 0 (mod p), we deduce that 


N,(-1) =1+ #{(n,z) (mod p): n?-—2?n-1=0 (mod p)}. 


For each given z (mod p) this has 1 + (=) solutions n (mod p). This equals 2 
if z = 0. If r 40 (mod p), then there are either 0 or 4 solutions to z4 = r (mod p) 


14.6. A more general viewpoint on counting solutions modulo p 509 


(as p= 1 (mod 4)), and so 


N,(-1) =3 aS (1 + (=*)) #{z: z+=r (mod p)} 


r=1 


=34+#{z: z+=-4 (mod p)} (mod 8). 
Now 2+ +4 = (x -—1+i)(r —1—i)(x1+1+4)(2+1-—i) (mod p) where i? = —-1 
(mod p), and so N,(—1) = 3+4=-—1 (mod 8). We therefore deduce from exercise 
(45.2{e) that N,(—1) =p+ 2A(—1)*t =7 (mod 8), so that A = 3 (mod 4), and 
therefore A = a. 


14.6. A more general viewpoint on counting solutions modulo p 


These results (Theorems [14.2] and (14.3) generalize nicely: Suppose that p does not 
divide the discriminant, 4a? + 27b?, of the polynomial x? + ax + b. Let 


Np :=#{(m,n) (mod p): m?=n?+an+b (mod p)}. 


The Hasse-Weil Theorem states that there exists an algebraic integer $(u +uV/—d) 
such that u? + dv? = 4p, for which N, = p+ u. This implies that |N, — p| < 2,./p. 
This will be proved in [Grab]. 

In general, for the curve f(x,y) = 0 one might guess that for arbitrary m and 
n (mod p), the value of f(m,n) (mod p) is equally likely to be in any given residue 
class; in particular, the probability that it is 0 (mod p) is 1/p, so we might expect 
that N,(f) := #{(m,n) (mod p): f(m,n) = 0 (mod p)} is close to p. We have 
proved this for linear and quadratic f and some examples of cubic f, with an error 
term no bigger than 2,/p. This is the sort of error term one would obtain from a 
suitable probabilistic model, and so we might conjecture that a similar error term 
holds more generally. Indeed it is known that if a curve f(x,y) = 0 has genus g >1 
(the cubic curves have genus 1), then |N,(f) — p| < 2g,/p. The notion of genus is 
beyond the scope of this book, but when h(x) (mod p) is a polynomial of degree 
d > 2 without repeated roots and f(x,y) = y? — h(x), then we can deduce that 


(14.6.1) |No(f) — pl < (d — 1)./p. 
We conclude this section by proving a surprisingly general result. 


Let f(x1,%2,..-,%n) € Zlx1,%2,...,%p] be of degree dil The number of solu- 
tions to f =0 (mod p) is congruent to 


(14.6.2) S- (1—f(rmi,...,17n)?"") (mod p), 


M1,-+.;Mn (mod p) 
since 
1 f(m)P-? = 1 (mod p) : f(m)=0 (mod p), 
0 (mod p) if f(m)#0 (mod p), 
by Fermat’s Little Theorem. The first term, all the 1’s, evidently sums to p” = 
(mod p). When we expand the second term, we get a sum of terms of the form 
mi! --»mn, each of total degree ky +--+ +k» <d(p—1). If such a term, summed 


nn? 


5 The degree of the monomial cx) ay? ...@°" is ey teg+---+e€n. The degree of a polynomial in 


several variables is the largest of the degrees of the monomials from which the polynomial is constructed. 


510 14. Counting integral and rational points on curves, modulo p 


over m; (mod p), is non-zero mod p, then k; > p— 1 by Corollary Hence if 
the sum over each m; is non-zero, then d(p— 1) > ky +---+ky > n(p—1), and so 
d>n. We can therefore deduce: 


Theorem 14.4 (Chevalley-Warning Theorem). Jf f(@1,...,@n) € Zlai,...,¢n] 
has degree < n, then 


#{m1,...;Mn (mod p): f(mj,...,Mn)=0 (mod p)}=0 (mod p). 
Therefore if f(0,0,...,0) = 0, that is, the constant term of f is 0, then there are 
at least p—1 distinct non-zero solutions to f(m1,...,7m™n) =0 (mod p). 

One example is the equation ax? + by? + cz? = 0 (mod p) since here n = 3 > 
d=2 and f(0,0,0) = 0. 

Exercise 14.6.1. Use (146.1) to show that if h(x) (mod p) is a polynomial of degree d > 2 


without repeated roots, then 


n (mod p) P 


< (d—1)\/p. 


Exercise 14.6.2. Prove that the number of n (mod p) for which (=) = (=) = 1 equals 


p/4 plus or minus an error of at most 4,/p. 


Exercise 14.6.3.' Let p be an odd prime which can be written as p = a? + b? with a = 3 


(mod 4). Prove that the number of n (mod p) for which (+) = ) = (2+) = 1 equals 


pti tee — 2 if p=1 (mod 8), and equals ptt te —1ifp=5 (mod 8). 


Exercise 14.6.4. Prove that the number of n (mod p) for which (2+) Se = (2) =1 
with k < logp, equals p/2* plus or minus an error of at most k,/p. 


Exercise 14.6.5. Let 61,...,6, be an arbitrary sequence of 1’s and —1’s, with k < logp. Prove 
that the number of n (mod p) for which (244) = 6; for 7 =1,2,...,k, equals p/2® plus or minus 


an error of at most k,/p. 


This is an example of an important family of questions: Let f(n) be a multi- 
plicative function that only takes values —1 and 1. Let 61,...,6, be an arbitrary 
sequence of 1’s and —1’s. Do there exist infinitely many integers n for which 
f(n+j) = 6; for 7 =1,2,...,k? And, if so, how often does this occur? Does each 
sign pattern occur (more or less) equally often? We have resolved this problem 
when f(-) is a Legendre symbol, in the last two problems. 

The outstanding open question of this type is when f(-) is the Liouville func- 
tion. There has been some progress recently, indeed there are many interesting 
applications, and it is the subject of much on-going research. 


Appendix 14A. Gauss sums 


For a given Dirichlet character .~ (mod q) we define the Gauss sum, g(x), by 


Note that the summand y(a)e?’**/4 depends only on the value of a (mod q). Gauss 
sums play an important role in number theory and have some beautiful properties. 


14.7. Identities for Gauss sums 


By making the change of variable a = nb (mod q) for some integer n with (n,q) = 1, 
the variable b runs through a complete system of residues mod gq as a does. Therefore 
we obtain the surprising identity 


qd 


(14.7.1) DE x(be77/9 = X(n) DF x (nb) e7"7""/4 = X(n)g(x). 
b=1 b=1 


Therefore if g is prime and y is non-principal, then 
2 


q-1| 4 
HDIgO0/? = Se Ix(n)9(x)|? = S~ x(byernn/4 
ees n=0 |b=1 


since the n = 0 sum equals }*7_, x(b) = 0. Expanding the square we obtain 


q q q-1 q 
S> x(b) So x(e) SS e994 = gS ° | x(b)|? = 49(4); 
b=1 c=1 n=0 b=1 


since aay, e2'™a/d — Q unless q divides a. Therefore we have proved that 


gol? = 4. 


511 


512 Appendix 14A. Gauss sums 


To better use this we have 
= > x(aje""7/4 = x(—1)9(X) 
=1 


by (£71), so that g(x)9(X) = x(—DlgO0/? = x(—1)q. In particular if x = (-/q), 


so that vy = x, then 
g((-/@))* = (-1/4)¢. 


Taking the square root, it remains to determine which sign gives the value of 
g((-/q)). It took Gauss four years to figure this out, so we will simply state his 
result: 


(14.7.2) g((-/q)) = fe ifq=1 (mod 4), 


iJ/q ifq=3 (mod 4). 


Another proof of the law of quadratic reciprocity. Let g* = (—1/q)q and 
g = g((-/q)) so that g? = g*. Now 


g = (>: (2) sete) “2G ’, erinarld (mod p), 
d p). 


see (mo Then, by (14.7.1), we have 


+ 
ae ‘ 
g= @) erirap/4 — (2) g (mod p). 
«\q q 


We may divide through by g as (9?, p) = (q,p) = 1, so that 
Pp = = PLe one 
(2) = gr t= - 1)/2 _ (q ) 1)/2 
= (4) mod »), 
Pp 


by Euler’s criterion. Both sides are integers equal to 1 or —1 and differ by a multiple 
of p, which is > 3, and so they must be equal. That is, we obtain Theorem|8.5] the 
law of quadratic reciprocity. 


14.8. Dirichlet 0-functions at s = 1 


We now use to try to find a simple expression for L(1,.). We again let q 
be prime so that 577_, x(b) = 0, and therefore the identity holds for all 
integers n (not just those n coprime to q). Assuming that there are no convergence 
issues in swapping the orders of summation, we have 


~ -1 2imnb/q q-1 e2innd/4q 
- G(X)X(n) dba1 X(b)e 
13) = wR) 9 Eh SG 
n>1 n>1 b=1 n>1 
The sum over n is the Taylor series for — log(1 — t) with t = e?'"""/4 (since each 


t #1). Therefore 


9(x)L(1, xX) = — D_ x(b) log(1 — €#"*/2)., 


14.9. Jacobi sums 513 


Exercise 14.8.1. (a) Prove that arg(1 — e’”) € (—3, 3). 
(b) Deduce that if 0 < 6 < 2m, then log(1 — e*®) — log(1 — e~*°) = i(@ — x) € (-7, 2). 


Now assume that .(—1) = —1 and add the b and q—6 terms in the sum above, 
so that by the last exercise we have 


q-1 q-1 
29(x) x(b) (log(1—e?**/4) —log(1—e~?'"/2)) =—i N° y(b)(2mb/q—n). 
b=1 b=1 


The second sum on the right-hand side is 0, and so multiplying through by —g(x), 
we obtain 


as —9(x)9(X) = 4- 

Now let x = (-/q¢) with prime g = 3 (mod 4) where q > 3, so that ¥ = y. 
Dirichlet’s class number formula (given in section [2.15] of appendix 12D with d= 
—q) reads th(—q) = \/qL(1, x) and therefore the last displayed formula becomes 


i= 
Z x0 
since g((-/q)) = i,/q by (14.7.2). This is Jacobi’s conjecture, stated as (12.15.1). 


14.9. Jacobi sums 


Let x and w be characters mod q and define the Jacobi sum 


ioM= SS xlrv(s). 
r,s (mod q) 
r+s=1 (mod q) 


To evaluate this sum we state the condition “r+ s = 1 (mod q)” in term of a sum, 
so that 


qe 
ibov) = TL xv) EE artery 
k=0 


r,s (mod - 

Po ye a 
= S- e 2% S- x(r)err ar S- w(s)e"™ 

‘ k=0 r (mod q) s (mod q) 

(ea é = 
= 2 le (RCE) 9(0) (HCE) 9) 

k=0 

= CD SoD a(dal) 


If q is prime and each of y, W, and x7 is non-principal, then we know that |g(~)| = 


l9(x)| = |g()| = V@ so that |7(x, Y)| = /@. By its definition j (x, ~) is an algebraic 
integer and belongs to the field defined by the values of y and w. 


514 Appendix 14A. Gauss sums 


14.10. The diagonal cubic, revisited 


Let p be a prime = 1 (mod 3). Since the group of characters mod p is isomorphic 
to the multiplicative group of reduced residues mod p, we know that there are two 
characters x,? (mod p) of order 3. We can establish the analogy to Corollary 
[8.1.1] for cubic residues: 
Exercise 14.10.1. Prove that if p{a, then 

#{x (mod p):az*=u (mod p)}=14+x(a7!u) + x2(a71u). 


By exercise [14.10.1] N(a,b,c) equals the sum over triples u,v,w (mod p) for 
which u+v+w = 0 (mod p), of 


(1+ x(a7*u) + x?(a~*u)) (1+ x(0770) + x7(B-*v)) (1 + x(e7 Tw) + x7(e Fw) 
We again multiply the triples out. The first product, 1-1-1, sums to p?. Any other 


product that contains a 1 sums to 0, since the remaining variables can be summed 
independently (and each independent sum is of the shape }°, x(t) = 0). Therefore 


N(a,b,c)=p?+ > S- x'(a7*u)x3 (b-*v)x*(e7 tw). 
1<i,j,k<2  u,v,w (mod p) 
u+v+w=0 (mod p) 
We may assume u # 0 (mod p) since those summands equal 0. Therefore we can 
write v = —ur,w = —us and separate out the sum >>, Goa p) X'*2**(u). This 
equals 0 when 3 does not divide 1+ 7-+k. This therefore leaves us with only the 
terms where i = 7 = k, in which case the sum over u equals p—1. Forti =j =k=1 
we have 
> x(wow) =(p—1)  S> xls) = (p- 15 (x), 
u,v,w (mod p) r,s (mod p) 
u+v+w=0 (mod p) r+s=1 (mod p) 


and likewise for y?. Therefore 

N(a, b,c) = p* + (p—1)(X(@)i(x.x) + x(D)i(X,X)), 
where d = abc. In section [14.9] we proved that j(y, x) is an algebraic integer in 
oe) of norm p, so we can write X(d)j(x,x) = whoy=3 with u = v (mod 2), 
and u? + 3v? = 4p. We therefore recover the result, 

N(a,6,c) =p? + (p—1)u, 

that we established in section [[4.4] Moreover by calculating j(, x) we can deter- 
mine the sign of b in Theorem [14.2 


Fig J#J#££€ 8 @@ | 
Chapter 15 


Combinatorial number theory 


Combinatorial number theory, one of the oldest topics in mathematical research, is 
still an area of active interest today. In this chapter we sample several topics, both 
old and new from the more combinatorial side of number theory. 


15.1. Partitions 
Let p(n) denote the number of ways of partitioning n into positive integers. For 
example p(7) = 15 since 
7 = 641 = 542 = 54141 = 443 = 44241 = 4414141 
= 34+34+1= 34242 = 8424141 = 84+1414+141= 24242+1 
= 242414141 = 24141414141 = 1414141414141. 


Euler observed that there is a beautiful generating function for p(n): In the gener- 
ating function p(n) is the coefficient of ¢”, and for each partition n = a, +---+ ax 
we can think of t” as t% ---¢°*. Splitting this product up into the values of the a;, 
but taking (¢*) if j of the a;’s equal a, we see that 


Pi ayj } — 1 
do p(nyt =I] Dee) ~ (1—t)(1- #?)(1 — #8)... 


n>0 a>1 \j>0 


Similarly the generating function for the number of partitions into odd parts is 


1 
i=pda=ha=Pe-? 


and the generating function for the number of partitions with no repeated parts is 
(1+¢)(1+2)(1+2)---. 


Exercise 15.1.1. Deduce that the number of partitions of n into odd parts is equal to the number 
of partitions of n with no repeated parts. 


515 


516 15. Combinatorial number theory 


Partitions can be represented by rows and columns of dots in a Ferrers diagram; 
for example 27 = 11+7+3+3+42-+1 is represented by 


the first row having 11 dots, the second 7, etc. Now, reading the diagram in the 
other direction yields the partition 27 =6+5+4+24+2+4+2+4+2+4+1+4+1+4+1+1. 
This bijection between partitions is at the heart of many beautiful theorems about 
partitions. For example if a partition has m parts, then its “conjugate” has largest 
part m. Using generating functions, we therefore find that the number of partitions 
with < m parts equals the number of partitions with all parts < m, which has 
generating function 


1 
(@—-H)—-#2)1—-#)-- (1 —t) 


Looking at Ferrers diagrams, partitions come in pairs, each partition with its 
conjugate, other than those that are self-conjugate (that is, those partitions in 
which the conjugate partition is the same as the original partition). Self-conjugate 
Ferrers diagrams have a symmetry about the diagonal axis of the diagram. Hence 
a self-conjugate Ferrers diagram looks like: 


eoeeee 
Ree RRR HR 
POnwNwNnNeR 
WwWrmre 

Nr 

eee 


A self-conjugate partition in 1-to-1 correspondence with another type of partition. 


This example yields 19 = 6+5+3+2+2+1. We have constructed another partition 
of 19, using the same Ferrers diagram. The top row and first column contain 11 
entries, marked by “1”’s in the diagram, what’s left of the second row and second 
column have 7 remaining entries, marked by “2”’s, and finally one element is left 
(marked by a “3”), yielding the partition 19 = 11+7+41. 


Exercise 15.1.2. Prove that there is a bijection between self-conjugate partitions and partitions 
where all the entries are odd and distinct. Give an elegant form for the generating function for 
the number of self-conjugate partitions. 


The sequence p(n) begins p(1) = 1, 2, 3, 5, 7, 11, 15, 22, 30; p(10) = 
42, 56, 77, 101, 135, 176, 231, 297, 385, 490; p(20) = 627,...; with p(100) = 
190569292 and p(1000) + 2.4 x 10%. Ramanujan was intrigued by these numbers, 
both by their growth (which seems quite fast) and by their congruence conditions. 
For n = 1000 we see that there are roughly 10Y” partitions, which is an unusual 


15.2. Jacobi’s triple product identity 517 


function in mathematics. Hardy and Ramanujan proved the extraordinary asymp- 
totic 


1 
15.1.1 n)~ eve. 
( ) p(n) ina 


and Rademacher developed their idea to give an exact formulal}| 


Ramanujan also noted several congruences for the p(n): 
p(dk+4)=0 (mod 5), p(7k+5)=0 (mod 7), p(11k+6)=0 (mod 11), 


for all k. Notice that these are all of the form p(n) = 0 (mod qg) whenever g 
divides 24n — 1 and, for a long time, these seemed to be the only such congruences. 
However, Ken Ono recently found many more such congruences, only a little more 
complicated: For any prime gq > 5 there exist primes ¢ such that p(n) = 0 (mod gq) 
whenever qé° divides 24n — 1. 


15.2. Jacobi’s triple product identity 


There are many beautiful identities involving the power series from partitions. One 
of the most extraordinary is Jacobi’s powerful triple product identity: If |a| < 1, 


then 
][@-2%)G4+277"*2)(1+2 2n-1 zt) = Soa" m 


n>1 meZ 


We will prove this in an exercise at the end of this section. For now, we shall 
determine some useful consequences: If we let x = t and z = 1, we obtain 

II (1 = ro 4 ne a 2 S- tm. 

n>1 meZ 
the sum on the right-hand side appeared when we were studying the number of 
representations of integers as the sums of four squares in section [2.18] of appendix 
12E. Now, if we let x =t and z = —1, then we obtain 
(15.2.1) [Ja-#@ 0-7 = Seym™. 

n>1 mez 

Exercise 15.2.1. Prove that 


IIa aor pet) = II @ ery pny? = > (ye, 


e>1 n>1 meZ 


Much more generally we can take x = t®, z = t?, and n = k +1 in Jacobi’s 
triple product identity to obtain 


[[a _ paht2ey gh dames el 4 prey) = a pum’ tom 
k>0 mez 


The special case a = b = 4 yields 
[[a a t*)(1 ey ies > i. 2 
k>0 meZ 
1This proof gave birth to the circle method, still one of the most important techniques in number 


theory. Proving this result is too difficult for this book, but see for many uses of the circle 
method. 


518 15. Combinatorial number theory 


Exercise 15.2.2. By writing 1+ t* as (1 — t?*)/(1 — t*) or otherwise, deduce that 


mim _ (L-#)(1- #4) - #9) 
a a-)0 80-8) 


m>0 


Letting « = t*, z = —t®, and n = k +1 in Jacobi’s triple product identity we 
obtain 


[[c = eg _ Perrarey (7 2 gee) = > (—1ymzom? tom 


k>0 meZ 
The special case a = 3, b = $ yields Euler’s identity, 
II 1 _ t”) _ = a ae +m 
n>1 meZ 


Exercise 15.2.3. Interpret this combinatorially, in terms of the number of partitions of m into 
unequal parts. 


Exercise 15.2.4. Show that if (12/.) is the Jacobi symbol, then 


t[[aG-e)= +> (=) ym 


n>1 m>1 
Now let z = —uz in Jacobi’s triple oe identity to obtain 
[[c = x")\(1 = ux?”)(1 — —2 u-‘) =: m ec —u) . 
n>1 meZ 


The third factor in the product on the left-hand side is 1— u~! at n = 1. The sum 
of the m and —m — 1 terms on the right-hand side is 


(aye (1 = u*)(u™ ua yr Agee th wo 
Dividing through by 1 — u~! and then letting u = 1, we obtain 


[[a-2")% = S3-p"@m+ ern, 


n>1 m=0 


Writing « = t*, with a = 2m +1, we have 


(15.2.2) tTJa-e"y = SS (-Fa™. 


n>1 a>1 
a odd 


We now let x = t* and b = 2m in exercise [15.2.1] and multiply this by (15.2.2) to 
obtain 


(15.2.3) t [[c ae able = f")\7 _ S- 1 Sebo) pa? 4b 
nel a>1, a odd 
bEZ, b even 

What are the coefficients of t? in this power series, when p is a prime? We know 


that p = a? + b? with a odd and 6 even, if and only if p = 1 (mod 4). In that 
case, a and 6 are unique up to sign, with b= pot (mod 4), and if A = (1) "a; 
then A = 1 (mod 4). We deduce from Theorem and exercise [14.5.2{e) that 


the coefficient of t? equals (accounting for the two possible signs of b) 


p—#{(x,y) (mod p): y?=a*—2 (mod p)}. 


15.3. The Freiman-Ruzsa Theorem 519 


This is also true (but less interesting) for p = 2 and for the primes p = 3 (mod 4), 
hence for all primes p. This might seem like a not terribly interesting coincidence 
but surprisingly, this is not a coincidence, and it generalizes to an extraordinary 
extent: For any cubic polynomial f(a) without repeated roots mod p, the quantity 


p—#{(x,y) (mod p): y’ = f(x) (mod p)} 
is the coefficient of t? in a certain power series with very special properties. Proving 
a precise version of this is the key to Andrew Wiles’s proof of Fermat’s Last Theo- 
rem. There is no way for the reader to see the connection from what I have written 
here; that would take a whole other book: We will understand much more about 
this strange link between seemingly very different types of questions in [Grab]. 


Exercise 15.2.5. Write the power series on the right-hand side of (15.2.3) as )7,5; f(m)t”. 
(a)? Prove that f(pq) = f(p)f(q) for any distinct primes p, q. 
(b)* Prove that f(-) is a multiplicative function. 


Exercise 15.2.6 (Proof of Jacobi’s triple product identity). Define 


N 
Py (a, 2) := II (1 — a?*)(1 + @?*-12)(1 4 o?h 1271) = a CN n(x) 2”. 
k=1 neZ 
(a) Prove that cyn(x) € Z[x]; cnn (x) = 0 for n > N, and cn,—n(#) = enn (2). 
(b) Prove that Py4i(@,z) = (1 — @?N 4+?) 4+ @? N41 z)(1 + o?N+12-1) Py (ar, z). 
(c)t Deduce that 


entin(2) = (L— 0° ¥?)(@2N en na (a) + (1+ 28Y Jenn (a) + 2°N Men nti (2). 
(d) Prove that en,n (x) = a” Meio —2?™). Nf eorereemrer a —2?™) for0O<n<N, for 
all N > 1. 


(e) Show that if |x| < 1, then limy_,oo en n (2) = a” for every integer n. 
(f) Deduce Jacobi’s triple product identity. 


15.3. The Freiman-Ruzsa Theorem 


We pass now to a much more modern theme in combinatorial number theory, a 
new subject called additive combinatorics, which combines older themes with new 
perspectives. The main question is to determine when sets of integers contain sub- 
sets with special structure. For example, any subset S of the integers up to N 
contains two consecutive integers if |S] > + + 1, and contains three consecutive 
integers if |S| > 2 + 2, and these results are best possible. But in this latter case, 
three consecutive integers are a special example of three integers in an arithmetic 
progression, so we might ask how big S needs to be to guarantee that it contains 
a non-trivial three-term arithmetic progression, or even a k-term arithmetic pro- 
gression. Another key theme is to start with a sparse set of integers, like, say, 
the squares, which have no four-term arithmetic progression (as we will establish 
in Theorem |17.5), and then ask whether we will obtain longer arithmetic progres- 
sions among the set of integers that are the sum of two squares, that is, by adding 
together elements of our original set. 


For any finite sets of integers A and B, we define their sumset 
A+B:={a+b:aEA,beE B}. 


We write 2A for A+ A. It is easy to see that |2A| < |A|(|A|+1)/2, since the distinct 
elements of A+ A are a subset of {aj +a;: 1<i<j<|Al}. 


520 15. Combinatorial number theory 


Exercise 15.3.1. Give an example of a set A of n integers for which |2A| = n(n +4 1)/2. 


Typically |A + A| is large and is only small in very special circumstances: 


Lemma 15.3.1. If A and B are finite subsets of Z, then |A+ B| > |A|+ |B] -1. 
Equality holds if and only if A and B are each complete finite segments of an 
arithmetic progression to the same modulus (that is, A= {a,a+d,...,a+(r—1)d} 
and B = {b,b+d,...,b+(s—1)d} for some a,b,r,s, and d> 1). 


Proof. Write the elements of A as aj < ag <-:: < a,, and those of B as by < 
bp <-+-<6b,. Then A+ B contains the r+ s — 1 distinct integers 
ay +b, <a, +bo <a, +b3 <---<aytbs <ag4+b, <ag tbs <-++- <a, +z. 


If |A+ B| =r+s-—1, then these must be the same integers, in the same order, as 


ay tb) <agt+b) < agt+bo <ag+b3 <+++<agthe <agt+bs < +++ <a,-+ ds. 


Comparing terms, we have a; + bji41 = ag +; for 1 < i < s—1; that is, bj = 
b; + (j — 1)d where d = ag — a1. A similar argument with the roles of a and b 
swapped reveals our result. 


If A+ B is small, but not quite as small as |A|+|B|—1, then we might expect 
some similar structure. However the combinatorics of comparing different sums 
quickly becomes very complicated. 

Looking for other examples in which A+ B is small, one soon finds the pos- 
sibility that A and B are each large subsets of complete finite segments of an 
arithmetic progression to the same modulus. For example, if A contains 2m inte- 
gers from {1,2,3,...,3m}, then A+A is a subset of {2,3,4,...,6m—1,6m}, and so 
|A + A] < 3|A|. One can find a criterion similar to Lemma [15.3.1] If |A| > |B| and 
|A+ B| < |A| + 2|B] — 4, then A and B are each subsets of arithmetic progressions with 
the same common difference, of lengths < |A + B| — |B| +1 and < |A+ B| —|A| +1, 
respectively. 

A further interesting example is given by 

A= B= ({1,2,...,10,101,102,...,110, 201, 202,...,210} 


or its large subsets. This can be written as 1+{0,1,2,...,9}+{0, 100, 200}, a trans- 
late of the sum of complete finite segments of two arithmetic progressions. More 
generally, define a generalized arithmetic progression C = C(ao,...,@%;.Ni,..-, Nz) 
as 

C := {ap tain tagng +++» +agng: O< nj < Nj -1 forl <j <k} 
where do, @1,...,@% are given integers and N;,..., Nx are given positive integers. 
Note that C(ao, a1,...,4%;Ni,No,...; Ne) = a0+do*_, a: {0,1,...,Ni—1}. This 
generalized arithmetic progression is said to have dimension k and volume N, --- Nx. 
Notice that 
2C' (ao, Q1,---,QAk; Ny, No, sey Nx) = C(2ao, Q1,---,Qk; 2N,-1, 2No-1, sey 2N,,—1), 
so that |2C| < 2*|C| (as long as the elements of C are distinct). We think of C as 
an image of the k-dimensional lattice segment 


8 = {61g HD & 0 NG = 1 ee & $0153. Ny 1p CZ, 


15.3. The Freiman-Ruzsa Theorem 521 


Here |2S| < 2*|S| and 9 is the set of lattice points inside a k-dimensional rectangle. 
We can replace the rectangle by the lattice points inside a “never too thin” convex, 
compact region of R* and get the same bound |2S| < 2'15| PJ 


We can combine these two constructions so that if A is a large subset of a 
generalized arithmetic progression, then |A+ A] < K|A| for some smallish constant 
K, 


Are there any other examples of sets A and B for which A+B is small? Freiman 
showed that the answer is “no”, having the extraordinary insight to suggest and 
prove that A+ A can be “small” if and only if it is a “large” subset of a “low”- 
dimensional generalized arithmetic progression of “not too big” volume} 


Theorem 15.1 (The Freiman-Ruzsa Theorem‘). For any constant K > 2 there 
exist constants d(K) and V(K) such that if A is a set of integers for which |A+A| < 


K\|A|, then A is a subset of d-dimensional generalized arithmetic progression of 
volume V -|A|, where d < d(K) andV < V(K). 


Define the product set A-B:={ab: a€ A,be B}. 


Exercise 15.3.2. Explain the bijection between A- B and log A + log B. 


We have seen that if A+ A is small, then A has a lot of additive structure; that 
is, it is a low-dimensional subset of a low-volume generalized arithmetic progression, 
by the Freiman-Ruzsa Theorem. If A- A is small, then log A has a lot of additive 
structure by the Freiman-Ruzsa Theorem; that is, A has a lot of multiplicative 
structure. Can a set have both types of structure at once? 


When A is the set of integers {1,2,...,N}, we have |A+ A| < 2N, so it is 
small. We saw that |A-A| < «N? (during our discussion of Erdés’s multiplication 
table problem, in section[I3.3). By taking products of pairs of primes < N, we see 
that |A- A] > 2(N)?/2 > N?/3(log N)?, so that A-A is not much smaller than N?. 
One might guess that A- A is large whenever A+ A is small. 


Erd6s and Szemerédi conjectured that a set cannot have this kind of addi- 
tive structure and multiplicative structure simultaneously, by predicting the sum- 
product inequality, 

max{|A+ Al, |A- Al} > ¢,|A|?~¢ 
for some constant c, > 0 for any « > 0. More daringly one might guess, from the 
same reasoning, that either |A+ B| > c,(|A||B|)'~© or |A-C| > e(|A||C|)1~< for 
any finite sets of integers A,B,C. In 2009, Solymosi showed that if A and B are 
two finite sets of real numbers with |A| > |B| > 1, then 


(|Al|BI)? 


A- B\|A+A||B+ B| > c————— 
|A- BIIA+ AB + Bl > oo 


?We need to be cautious with thin regions. For example, the rectangle R := {(x,y): 4 SOS 
3, 0<y< N} has volume N/2 and contains 0 lattice points, whereas 2R has volume 2N and contains 
2N + 1 lattice points. 

’Freiman’s 1962 proof is both long and difficult to understand. Ruzsa’s 1994 proof of Freiman’s 
result is elegant and introduced new techniques, heralding an explosion of ideas in this area. 

4Who should get credit for a theorem? Only the person who had the insight and power to prove 
it? Or perhaps someone who gave a later proof that inspired others to be interested? There is no rule. 
Here we have chosen to give Ruzsa credit, as well as Freiman who had the original insight. In chapter 17 
we discuss Mordell’s Theorem, which is usually called the “Mordell-Weil Theorem”, thanks to several 
brilliant insights that Weil brought to the subject, as well as an extraordinary generalization. 


522 15. Combinatorial number theory 


for some constant c > 0. We can deduce from this, for example, that if |A + A| < 


K|A| and |B + B| < K|B], then |A- B| > c{SUE! where ce! = ¢/K?. 


Exercise 15.3.3. Deduce the sum-product inequality for any « > 2/3. 


15.4. Expansion and the Pliinnecke-Ruzsa inequality 


Suppose that A is an additive set with |2A| < K|A|. By the Freiman-Ruzsa Theo- 
rem, we know that A C P, where P is a generalized arithmetic progression of small 
dimension and A is a large portion of P. Adding several copies of the inclusion 
A C P together, it follows that kA C kP for any integer k > 2. The size of |kP| 
grows in a fairly controlled manner as a function of k, which implies that |kA| 
cannot grow too rapidly. Such a statement can be proven directly) 


Theorem 15.2 (Pliinnecke-Ruzsa Theorem). If A and B are finite sets of integers 
for which |A+ B| < K|A|, then |mB —nB| < K™*"|A| for all integers m,n > 0. 


In particular, taking B = A we see that if |A+ A| < K|A], then |nA| < K”|A| 
for alln > 1. Also, taking B = A or B = —A yields that if |A+ A| < K|A|, then 
|mA-—nA| < K™*"|Al. 

A first result in adding sets is the Ruzsa triangle inequality which states that 
for any given finite sets of integers U,V,W one has 


(15.4.1) IV —WIlU| <|V —Ulju — Ww). 


Proof. We will define a map ¢: (V — W) x U > (V —U) x (U — W) and prove 
that it is an injection, which implies the result. Given d € V — W select a pair 
va © V,wa € W for which d = vg — wa (there may be more than one such pair, but 
for each d we make a definite choice). Then define 


o(d, u) = (va — u, U— wa) 
for each d € V—W and u € U. To prove that ¢ is an injection, suppose that 
(x,y) € Im(d) C (V —U) x (U-—W). If é(d,u) = (a, y), then «+ y = (va —u) + 
(u — wa) = Va — Wa = d, and therefore we can determine d and hence vg and wa 
from (az, y). And we also determine u as u = —x + va (= y — Wa). 


Our proof of the Pliinnecke-Ruzsa Theorem (due to George Petridis) rests on 
the following “expansion” result. 


Proposition 15.4.1. Suppose A and B are sets of integers for which |A+ B| < 
K|A|. Let X be that subset of A for which the ratio |X + B|/|X| is minimal. Then 


IS+X+B\< k|S+X], 
for every finite set of integers S. 
A fairly short proof of this result may be found in [@G]. An easy consequence 
is that 
(15.4.2) |X +kB| < K*|X|. 


° Rather than as a consequence of the Freiman-Ruzsa Theorem. Indeed it is an important ingredient 
in Ruzsa’s proof of the Freiman-Ruzsa Theorem. 


15.5. Schnirel'man’s Theorem 523 


We prove this by induction: It is trivial for k = 0. If it is true for k — 1, then, by 
Proposition [15.4.1] with S = (k — 1)B, we have 


|X+kB) =|X+S+B\)< K|X4+S5|<K-K*|X|= K*|X|. 
Proof of the Pliinnecke-Ruzsa Theorem. By Ruzsa’s triangle inequality with 
U,V,W replaced by X,-—mB,—nB, respectively, and then (15.4.2), we have 
|mB —nB| |X| < |X +mB|-|X +nBl < K™™|X/?. 
Therefore |mB—nB| < K™*"|X|< K™*"|A] as X CA. 


Another use of Proposition [15.4.1] comes in noting that if © X, then B+ S+ 
zC B+S+X,so that |B+S|=|B+S+2|<|B+S54+X|< K|X+S| < K|A+S|; 
that is, |B +S||A] <|A+ B||A+ S|. Changing notation gives 


(15.4.3) |V+WIlU| <|V+U|lU + WI, 


the Petridis triangle inequality, which nicely complements the Ruzsa triangle in- 
equality. 


15.5. Schnirel/man’s Theorem 


We call a set of non-negative integers A an asymptotic basis of order h if every 
sufficiently large integer is the sum of at most h elements of A. The most famous 
example is Goldbach’s conjecture which asserts that every even integer > 4 is the 
sum of two primes. This implies that every integer > 2 is the sum of at most 3 
primes (by taking n — 3 as the sum of two primes for every odd integer n > 7). 
Therefore the Goldbach conjecture implies that the primes form an asymptotic 
basis of order 3. 


Suppose that A contains 0. Then 
AC2AC3AC::-ChA. 


Therefore A is an asymptotic basis of order h if and only if every sufficiently large 
integer is the sum of exactly h elements of A (that is, N\ hA is finite). A pervasive 
phenomenon in additive number theory is the enrichment of structure that one sees 
when moving from A to 2A, to 3A, and so on. In this section we explore ideas of 
Schnirel/man which allow us to quantify the intuition that the sets 7A should get 
bigger as j increase. 

We can quantify an infinite set of integers, A, by various notions of density. 
The most useful for studying the addition of sets is Schnirel'man density. The 
Schnirel’man density of a set A C Z is denoted by o(A), and it is defined by 


o(A) := inf aM 


n>1 n 


where A[n] := #{a € A: 1<a<n}. One obvious consequence of the definition 
of Schnirel’man density is that 


Al[n] > no(A) for all n > 1. 


Taking n = 1, we immediately see that the Schnirel/man density has the slightly 
strange property that o(A) = 0 unless 1 € A. 


524 15. Combinatorial number theory 


Using the pigeonhole principle we prove the following simple, but useful, fact 
about Schnirel/man density. 


Lemma 15.5.1. Let A and B be sets of integers, both containing 0. If o(A) + 
o(B) > 1, then A+ B contains all nonnegative integers. 


Proof. We have 0 = 04+0¢€ A+B. Suppose that n > 1 and n¢ A+B. Then 
n¢ A,orelseen=0+n€A+B. Similarly n ¢ B. Moreover we cannot have 
a€ Aandn—aeé B, for any a, 0<a<n, or elsen=a+(n—a)€ A+B. This 
implies that 14(a) + 1g(n — a) < 1 for all such a (where 1,4 is the characteristic 
function for A). These two observations Rees imply that 


A[n] + B[n] = A[n — 1] + Bln - 1] = v4 a) +1p(n—a)) <n-1. 


On the other hand 
A[n] + Bin] > o(A)n+ 0(B)n > n. 
These two inequalities are contradictory, and so the result follows. 


Next we come to a more serious fact, which quantifies the intuition that 4+ B 
is measurably bigger than A and B. 


Theorem 15.3 (Schnirel’man’s Theorem). Let A and B be sets of integers, and 
suppose that 1 € A and0€ B. Then o(A+ B) > o(A) + 0(B) — o(A)o(B). 
This can be reformulated as 1—o0(A+ B) < (1—o(A))(1—o(B)). 


Proof. We denote the elements of A up ton as 1 =a, <-:: < ag <n, so that 
k > 0(A)n. We write ay41 := 2+ 1 for convenience in this proof (even though this 
element might not lie in A). We obtain a lower bound on (A+ B)[n] by counting 
only those elements of A+ B that lie in [a;,a;41) which take the form a; + b with 
b € B, so that 0 < b < aj41 — a; — 1. Therefore 
|(A + B) a [a;, @i41)| > |B M {0, eee Q41 — Ai — 1}| =1 + Blai41 — Ay —- 1] 
>1+0(B)(aiqa — a; — 1) = 0 (B)(ai41 — ai) +1 — o(B). 

Since the intervals [a;,ai;41), 1 <i <k, partition the integers in [1,n], we deduce 
that 


k 
(A+ B)[n = DI |(A + B) 9 [ai, ai41)| 


> 


Ze (o(B)(ai41 — ai) + 1-0 (B)) 


and the result follows as k > a(A)n. 


An immediate consequence of the above two results is that any set containing 
0, with positive Schnirel’man density, is an asymptotic basis of some finite order 
h depending only on o(A). In fact, better than this, hA contains all nonnegative 
integers. 


15.6. Classical additive number theory 525 


Corollary 15.5.1. Let A be a set of integers with 0 € A and o(A) > 0. Then 
hA=N whenever h > 2[ (log 2)/(—log(1—(A)))]. (Here [t] denotes the smallest 
integer >t.) 


Proof. As o(A) > 0 we know that 1 € A, and hence (since 0 € A) 1 € KA for all 
k > 1. We deduce, by induction from Schnirel/man’s Theorem, that 1 — o(kA) < 
(1—o(A))* for all k > 1. If k > [(log 2)/(— log(1 — o(A)))], then (1— o(A))* < 5, 
and therefore o(kA) > 1/2. It follows from Lemma [15.5.1] that, for any such k, 
2kA =kA+kA contains all nonnegative integers. 


In 1942 Mann improved Schnirel/man’s Theorem to 
o(A+ B) > min{o(A) + o(B), 1} 
which is an optimal result of this type. 


Schnirel/man’s Theorem leads to the first and easiest proof that there exists an 
integer & such that every integer > 2 is the sum of at most k primes; that is, P, 
the set of primes, is an asymptotic basis for the positive integers. To prove this one 
first establishes that o(2P — 4) > 0 (a result which is proved using sieve theory; 
see [Grab]). The result then follows from Corollary In 1937 Vinogradov 
proved that every sufficiently large odd integer is the sum of three primes (which 
is proved in [Graal), and so P is an asymptotic basis of order 4 for the positive 
integers. 


15.6. Classical additive number theory 


Given a largish subset of the integers up to N one can ask whether it contains 
certain simple structures, because of its large size. Or, rather than quantify “lar- 
gish”, one might instead partition the integers into two (or more) subsets and ask 
whether any of these subsets have the given structure. This is a familiar theme 
from combinatorics, and ideas from that subject will allow us to give a first answer 
to these questions. 


We begin with a well-known result from graph theory: 


Lemma 15.6.1. For all integers r > 1 there exists a constant N(r) such that if 
the edges of the complete graph with N vertices are colored with r colors, where 
N > N(r), then there is a monochromatic triangle (that is, the edges of some 
triangle in the graph all have the same color). 


Proof. By induction on r > 1. Evidently N(1) = 3. For larger r consider the 
N — 1 edges attached to any one vertex v. If N —1 > rN(r—1), then, by the 
pigeonhole principle, there must be some color c for which there are > N(r — 1) 
edges adjacent to v of color c. Let H be those vertices that share an edge of color 
c with v. If there are any two vertices in H that are attached by an edge of color c, 
then these two vertices along with v form a monochromatic triangle. Otherwise the 
edges of H are colored by just r — 1 colors and the result follows by induction. 


Exercise 15.6.1. Justify that if N > r(N(r— 1) — 1) + 2, then there must be some color c for 
which there are > N(r — 1) edges adjacent to v of color c. Show by induction that we may take 
N(r) <3r! 


526 15. Combinatorial number theory 


This is a typical Ramsey theory proof, involving a greedy-type algorithm, and 
leads to bounds that are probably far larger than the best bounds possible. However 
there are questions in the subject in which the bounds must grow extraordinarily 
fast and cannot be given in terms of primitive recursive functions. 


There is a quite beautiful corollary: 


Theorem 15.4 (Schur’s Theorem). For all integers r > 1 there exists a constant 
N(r) such that if the integers up to N are colored with r colors, where N > N(r), 
then there is a monochromatic solution to x+y = z in positive integers x,y,z < N. 


Proof. We construct the complete graph on N vertices, labeled 1,2,...,N, and 
joining vertices i and 7 with an edge that has the color of |j — 7|. Lemma [15.6.1] 
tells us that there is a monochromatic triangle joining, say, the vertices with labels 
i<j<k. Therefore if « = j—-i, y= k—j, and z = k —1, we know that these 
positive integers all have the same color and indeed satisfy «+ y = z. 


In 1927 van der Waerden answered a conjecture of Schur, by showing that if the 
positive integers are partitioned into two sets, then one set must contain arbitrarily 
long arithmetic progressions. 


Theorem 15.5 (van der Waerden’s Theorem, 1927). For any given integers r > 2 
and k > 3 there exists a constant W(r,k) such that however we color the posi- 
tive integers < W(r,k) with r colors, there is a monochromatic k-term arithmetic 
progression. 


Exercise 15.6.2. Prove that W(2,3) = 9. 


Exercise 15.6.3. Deduce that if the positive integers are partitioned into r sets, then at least 
one of the sets must contain arbitrarily long arithmetic progressions. 


Exercise 15.6.4. Prove that if we color any arithmetic progression of integers of length W(r, k) 
with r colors, then it will contain a monochromatic k-term arithmetic progression. 


Proof for all r > 2 given that the W(2,k) exist. (This simple proof explains 
why Schur only asked for two colors.) For each fixed k we prove our result by 
induction on r > 2, assuming the r = 2 case. 


For r > 2, let W := W(r—1,k). Suppose that the positive integers < N := 
W (2, W) are colored with the colors 1,2,...,1r. We will combine colors 1,2,...,r—1 
into a new color which we will call color 0, so the integers up to N are colored by 
the two colors 0 and r. By the definition of N, there is either a W-term arithmetic 
progression of color r, which is far more than we were seeking (as W = W(r—1,k) > 
k), or there is a W-term arithmetic progression of color 0. Reverting back to our 
original coloring, this means that there is an arithmetic progression of integers of 
length W(r —1,k), amongst the positive integers < N, which is colored with r—1 
colors. The result then follows from exercise [15.6.4] 


Exercise 15.6.5. Partition the integers into two sets neither of which has an infinitely long 
arithmetic progression. 


A set of integers A C N has positive lower density if there exists 6 > 0 such 
that A[n] > dn for all sufficiently large n. The most important result in this area 


15.6. Classical additive number theory 527 


is the following result of Szemerédi. It tells us that any set of integers of positive 
lower density contains arithmetic progressions of arbitrary (finite) length. 


Theorem 15.6 (Szemerédi’s Theorem, 1975). For any 6 > 0 and integer k > 3, 
there exists an integer Ny5 such that if N > Nz and A C {1,2,...,N} with 
|A| > dN, then A contains an arithmetic progression of length k. 


Exercise 15.6.6. Show that if N > N,,5 and A is a subset of an arithmetic progression of length 
N, with |A| > 6N, then A contains an arithmetic progression of length k . 


Exercise 15.6.7. Deduce van der Waerden’s Theorem from Szemerédi’s Theorem. 


The k = 3 case was first proved in a cunning proof by Roth in 1952 using Fourier 
analysis. In 1969 Szemerédi proved the k = 4 case by combinatorial methods and 
extended this to all k in 1975. In 1977 Furstenberg reproved Szemerédi’s Theorem 
in a very surprising manner, using ergodic theory. It was not until 1998 that Tim 
Gowers finally gave an analytic proof of Szemerédi’s Theorem, the proof based on 
the overall plan of Roth, but involving a new kind of higher-dimensional analysis 
(partly based on the Freiman-Ruzsa Theorem). Gowers’s proof was the starting 
point for the proof of the following theorem: 


Theorem 15.7 (Green and Tao, 2008). For any integer N one can find (infinitely 
many different) pairs of integers a,d > 1, such that a,a+d,...,a+(N-—1)d are 
all primes. 


We discussed this great result, and its corollaries, in various appendices of 
chapter 5. 


How much further can one develop Szemerédi’s Theorem? Erdéds conjectured 
that any set A of positive integers for which 


acA 
must contain arbitrarily long arithmetic progressions, a question that is still very 
open today. Erdés stated this conjecture as a means to prove that there are arbi- 
trarily long arithmetic progressions of primes (but this is not how Green and Tao 
proceeded). Can Erdés’s conjecture even be proved in the k = 3 case? 


How large is the largest subset S(.N) of {1,2,...,N} that has no three-term 
arithmetic progression? If one could show that |S(N)| < N/log N, then one would 
know that there are infinitely many three-term arithmetic progressions of primes. 
Recently Thomas Bloom, improving on work of Tom Sanders, came agonizingly 
close to this goal by showing that |S(N)| < c(loglog N)*-N/ log N for some constant 
c > 0. Is this close to the true size of S(N)? The best we know is the far smaller 
lower bound $(N) > Ne~¢°v'°8% for some ¢ > 0 given by a beautiful construction 
of Behrend: 


Exercise 15.6.8. Show that a,b,c are in arithmetic progression if and only if a+c= 2b. 


Exercise 15.6.9. Write a = yeh aj(2m)'-! € C := C(0, 1, 2m,...,(2m)*-1;m,...,m). 
(a) Show that a is an integer in the range 0 < a < (2m)* —1. 
(b) Show that a,b,c € C are in arithmetic progression if and only if the vectors a= (a1,...,@x), 
b, and ¢ are collinear. 
(c) Show that |a|? is an integer in the range 0 < |a|? < km?. 


528 15. Combinatorial number theory 


(d) Let C; ={aeC: |a| =r}. Show no three distinct elements of C; are collinear. 

(e) Prove that there exists an r for which C; contains > m*/(1+ km?) elements. 

(f)t By selecting k = [/log N] and m = [N1/*/2] prove that if N is sufficiently large, then 
S(N) > Ne~cv&N for some constant c > 0. 


In this last exercise we found a large subset of {0,1,...,2m—1}* c Z* with 
no three-term arithmetic progression and used this to construct a large subset of 
{1,2,...,N} that has no three-term arithmetic progression. This suggests extend- 
ing the problem in several directions, most fruitfully to Pe, where Fs is the field of 
3 elements (see section [7.25]in appendix 7E for details). This differs from {0, 1, 2}” 
because, for example, (0,0), (1,2), (2,1) is a three-term arithmetic progression in 
F2 but not in {0,1,2}?. It is easier to do arithmetic in a field than in the inte- 
gers (which is only a ring), and so researchers have long developed techniques for 
bounding the largest subset of {1,2,...,N} that has no three-term arithmetic pro- 
gression, by bounding the largest subset of F¥ that has no three-term arithmetic 
progression, where 3” is about size N. However this approach disintegrated in 2016 
when Croot, Lev, and Pach showed that much smaller sets in Fé must contain 
three-term arithmetic progressions. Indeed it is now known [5] that if A is a subset 
of FX with > 2.756* elements, then A contains a three-term arithmetic progression. 
(Since F£ contains N := 3* elements, then the bound here is > N-9??, much smaller 
than Tea which is the much sought-after bound in the integer case.) 


15.7. Challenging problems 


Exercise 15.7.1.7 Prove that for every set of integers a1,...,@,, there exists a non-empty subset 
which sums to an integer divisible by n. 


Exercise 15.7.2.' Suppose that 1 < a1 < --- < an are positive integers, and let the integers 

0 = by < bo <--+ < bon be all the sums of subsets of the a;, that is, the numbers Mier a, for 

each I C {1,2,...,n}. 

a) Write down the generating function, a x°3, in terms of polynomials involving the a;. 
Henceforth we will assume that the b; are all distinct. 


b5 


(b) Prove that b; > j — 1 for each j. 

c) Deduce that if 0 < # < 1, then pare xi < jae, 

(d) Deduce further that [J7_,(1+ @%) <]Tf_,Q + a2"), 
) 


: 1 log(l+@%) 71 pl log(1+y) 
Prove that ifa > 0, then fj °#—*~“da = + f, *8 5 Y dy. . 
f) Deduce that 3", + <2- 4 with equality only when each a; = 2*—!. 


i=l a; — Qn 


Exercise 15.7.3.1 Let ai,...,ay be a sequence of distinct real numbers. Prove that if m is 
the length of the longest decreasing subsequence, and n is the length of the longest increasing 
subsequence, then mn > N. 


We now prove the following related result using techniques from section [14.6] 


Theorem 15.8 (Erdés-Ginzburg-Ziv). Let p be prime. Any sequence of 2p — 1 
integers contains a subsequence of size p that sums to an integer divisible by p. In 
fact we show that the number of such subsequences is = 1 (mod p) 


° Erdés-Ginzburg-Ziv actually showed that for any integer n, any sequence of 2n—1 integers contains 
a subsequence of size n that sums to an integer divisible by n. 


Questions on sets of sums 529 


Proof. The number of subsets of {a1,...,@2p—1} of size p that sum to 0 (mod p) 
is 
es 2p —1 
= & P(E) Jay) -Letnat moa. 
IC{l1,...,2p—1} tel 
\I|=p 


Here the e; > 1 and sum to p—1 (so that 1 < k < p—1). The coefficient c. consists 
of eo ma ; from expanding the power, multiplied by how often this coefficient occurs. 
This j is fone ei the number of subsets I c {1,...,2p— 1} of size p which contain 
{i1,...,%}. But ee oe or) = 0 (mod p) for 1 < k < p—1 as p divides the numerator 
but not the denominator, and Ce ") = 1 (mod p). 


Further reading for chapter 15 


1] Scott Ahlgren and Ken Ono, Addition and counting: The arithmetic of partitions, Notices Amer. 
Math. Soc. 48 (2001), 978-984. 


2] Henry L. Alder, Partition identities—from Euler to the present, Amer. Math. Monthly 76 (1969), 
733-746. 


3] George E. Andrews, Euler’s “De Partitio numerorum”, Bull. Amer. Math. Soc. 44 (2007), 561-573. 


George E. Andrews and F. G. Garvan, Dyson’s crank of a partition, Bull. Amer. Math. Soc. 18 
(1988), 167-171. 


5] Jordan S. Ellenberg and Dion Gijswijt, On large subsets of FG with no three-term arithmetic 
progression, Annals Math. 185 (2017), 339-343. 


6] Izabella Laba, From harmonic analysis to arithmetic combinatorics, Bull. Amer. Math. Soc. 45 
(2008), 77-115. 


Appendix 15A. Summing sets 
modulo p 


We have seen in Lemma[I5.3.1] that if we add two finite sets of integers A and B, 
then |A+ B| > |A|+|B]—1. Moreover much of the rest of chapter 15 has focussed 
on the expansion properties of adding the same set to itself several times. Here we 
focus on summing sets (mod n). 


15.8. The Cauchy-Davenport Theorem 


Theorem 15.9 (Cauchy-Davenport). Let p be a prime, and suppose that A and B 
are non-empty subsets of Z/pZ. Then 


|A+ B| > min(|A| + |B — 1,). 


Proof. The theorem is unaffected by translating A and B, that is, replacing A by 
A+ and B by B+y. Sometimes it will be convenient to perform such translations. 


Another useful transformation is to replace the pair of sets A, B by their union 
U := AUB and their intersection I:= AN B. Note that |U| + |Z| =|A|+ |B| and 
U+ICA+B, so that |A+ B| >|U+1|. Therefore if we can prove the theorem 
for the pair J, U, then it follows for the original pair A, B. 


Suppose without loss of generality that |A] > |B|. By translating A and B 
appropriately we may assume that 0 € A, B. We will prove our result by induction 
on |B]. If |B] =1, then the result is trivial. If |B] > 1 and there exist a € A,b€ B 
for which B—b Z A—a, then the result follows by applying the induction hypothesis 
to the pair J = (A—a)N(B—b), U = (A—a)U(B-—b), since in this case |I| < |B]. 
Hence we may assume that B—b C A —a, and therefore that B—b+aC A for 
alla € A and 6 € B. Taking the union over all a € A,b € B we deduce that 
B-B+ACA. 

From this we see that 2B-2B+A=B-—B+(B-—B+A)CB-B+ACA, 
and then, by induction, that kB—kB-+A C A for allk > 1. Since 0 € B we deduce 


530 


15.8. The Cauchy-Davenport Theorem 531 


that if H is the subgroup of Z/pZ generated additively by B, then A+ H = A. 
But there are only two subgroups of Z/pZ: either H = {0} or H = Z/pZ. In the 
first case we have B = {0} since B C H. The theorem is trivial in this case. In the 
second case we have A = Z/pZ, in which case the result is also trivial. 


Exercise 15.8.1. Suppose that A is a subset of Z/pZ with at least two elements. Show that if 
n> en then nA = Z/pZ. 

Exercise 15.8.2. Use Theorem [15.9] to show that if A and B are finite sets of integers, then 
|A+ Bl > |A] + |Bl—1. 


We can try to modify this argument to work in Z/nZ or other additive groups 
G. Some care is needed here for, if H is a subgroup of G, then H + H = H, so 
there is no expansion. Moreover if A = Ag + H and B = Bo +H are unions of 
cosets of H (with Ap and Bo minimal), then A+ B = (Ap + Bo)+ 4H, and therefore 


|A + B] — |A| — |B] = |A](|Ao + Bo| — |Aol — | Bol): 
in the particular case that |Ao| = |Bo| = 1, we obtain |A+ B| = |A| +|B|—|H|. 


To state the best possible result in general finite additive groups G, we need to 
define the stabilizer of A, a subset of G, to be 


Stab(A) :={geG:g+A=A}. 


One can see that H := Stab(A) is a subgroup of G, with the property that A+ H = 
A, and so A is a union of cosets of Stab(A). 


Theorem 15.10 (Kneser’s Theorem). If A and B are finite subsets of an additive 
group G and H = Stab(A + B), then 


|A+ B) >|A+H|+|B+HA|-|A}. 


Exercise 15.8.3. Suppose that A is a subset of Z/NZ and that A additively generates all of 
Z/NZ; that is, there exists r for which rA = Z/NZ. Prove that NA = Z/NZ. 


Exercise 15.8.4. We give another proof of Theorem|15.8 
(a) Show that any sequence of 2m (not necessarily distinct) residues mod p either has m+ 1 
identical residues, or can be partitioned into m sets of two distinct residues. 
(b) Prove that if Ai,...,Ap—1 are subsets of Z/pZ which each contain two distinct residues, 
then Aj +---+Ap-1 = Z/pZ. 
(c) Deduce Theorem [15.8 


Appendix 15B. Summing sets 
of integers 


15.9. The Frobenius postage stamp problem, III 


If we are allowed to use no more than N postage stamps, which have face values 
a and b where (a,b) = 1, then what amounts of postage can we cover? In other 
words, can we understand the set 


{am+bn: m,né€Z, mn>0, m+n < N}? 


This can be rephrased in the language of additive combinatorics: Let A = {0,a, b}. 
We wish to determine NA. Note that AC 2A C 3A C.---. Recall that in Theorem 
[3.12] we proved that every positive integer outside the set E(a,b) can be written as 
am + bn with m,n > 0, where E(a, b) is a subset of the integers < mn —m—n. 


Theorem 15.11. Suppose that 1 <a < } are coprime integers. If N > b, then 
NA={neEZ: 0<n< bN}\ E(a,6) \ (ON — E(b—a,))). 
Here bN — E(c,b) = {bN —r: re E(c,b)}. 


Proof. If re NA, then0=a-0+6:0<am+bn=r<a-:0+b-N=DN. We 
now prove that if r is any integer in the range 


ab—a-—b<r<bN — ((b—a)b— (b—a) — 5), 


then r € NA. Select m to be the least non-negative residue of r/a (mod b), so that 
0<m <b-—1andr=am (mod b). This implies that there exists n for which 
r=am-+bn. Now bn =r—am>a(b—1—m)—b>-—basr>ab—a-—b, and so 
n > 0. On the other hand, 


bn = r—am < bN —(b—a)(b—1)+b-—am < bN—(b-—a)m+b—am = b(N+1-—m) 


and som+n< N. Therefore r € NA. 


532 


15.9. The Frobenius postage stamp problem, IIT 533 


Now suppose that 0 < r < ab—a—bandr ¢ E(a,b). As r ¢ E(a,b) we 
can write r = am + bn for some integers m,n > 0. Therefore a(m+n) <r < 
ab—a—b<a(b—2) and so m+n <b—3. Hence there exists ¢ < b — 3 for which 
rE€lAcCbACNA. 

Now suppose that bN —((b—a)b—(b—a)—b) < R< bN and R ¢ bN—E(b—a, b). 
Let r = bN — R so that 0< r < (b—a)b— (b— a) —b with r ¢ E(b—a,b). By the 
previous paragraph (with a replaced by b — a), we see that r = u(b— a) + vb with 
u+vu<b—38. Therefore 


R=bN-r=ua+(N-—u-v)b=ma+t+nb 


withO<m=u,n=N-—u-v,andm+n=ut+(N-—u-v)=N-v<N so 
that Re NA. 


Exercise 15.9.1. Let A = {a,b,c} where a < b < c are integers for which (b — a,c— b) = 1. 
Prove that if N is sufficiently large, then 


NA={nE€Z: aN<n<cN}\ (aN +€(b—a,c—a)) \ (cN — E(ce—b,c—a)). 


By exercise [3.25.5{b) we know that if a; < --- < ax are positive integers for 
which (a1,...,@,) = 1, then there exists a finite set E(a1,...,a,) for which 
P(ai,...,@%) = {ainit-::+agnz : m1,...,n~ € Zoo} = {m = O}-—E(a,..., ax). 
Exercise 15.9.2. Suppose that ai = 0. 

a) Use exercise[15.8.3]to show that if m > a2, then m € P(a1,...,@x)- 
) Let N > 2ax. Prove that if az <m <axz(N — ax), then me NA. 
c) Let N> az. Prove that if m < a? and m € P(a1,..., ax), then me NA. 
) Let N > az. Deduce that if m > a,N — az and m € ax N — P(ax — a1,..., Ak — ax), then 
me€é NA. 
e) Prove that if N is sufficiently large, then 
NA={nEZ: 0<nK< agN}\ E(ar,..., an) \ (axN — E(ax — a1,..., an — ag)). 


f) State the general result when aj is not necessarily 0. 


Chapter 16 


The p-adic numbers 


16.1. The p-adic norm 


For a given prime number p and integer n we define v,(n) to be the maximum 
power of p that divides n; that is, p’»(™|/n. For example v2(60) = 2 whereas 
v3(60) = 1, v5(60) = 1, and v7(60) = 0, since 60 = 2?- 31-51-79. We can 
extend this definition to all rational numbers r, for if we write r = a/b where a 
and 6 are integers, then let vp(r) = vp(a) — up(b). For example v2(49/60) = —2, 
v3(49/60) = —1, v5 (49/60) = —1, and v7(49/60) = 2. Notice that, in this definition, 
it does not matter if a and b have a common factor. Moreover we can always write 
r= pm /n where m and n are integers, neither of which are divisible by p. We 
define Zip) = {r € Q: v,(r) = O}, which is a subring of Q. 

The size, |r|, of a given number r, measures how far that number is from 0. We 
now define a notion of size in terms of the power of p dividing the given rational 
number r, measuring how far away r is from 0 in terms of p-divisibility. Since 0 is 
divisible by any arbitrarily large power of p, the higher the power of p that divides 
r, the closer it is to 0, and thus the smaller it is in the p-adic norm. This justifies 
defining the p-adic norm, |.|,, by 


Ir|p = p vel), 


For example, |49/60|2 = 22, |49/60|; = 3, |49/60|; = 5, |49/60|2 = 7-2, and 
|49/60|, = 1 for all primes p > 7. The inverses of all of these prime powers appear 
in the factorization 49/60 = 2~23~15~!7?, which leads us to the product formula: 


(16.1.1) re TL Fes 
p prime 


In this context, it is convenient to write |r|. in place of |r|. (16.1.1) is reminiscent 
of Hilbert’s product formula, which was stated in Theorem [9.6] of section [9.10] in 
appendix 9B. 


Exercise 16.1.1. Prove the product formula for all non-zero rational numbers r. 


535 


536 16. The p-adic numbers 


Any norm introduces a notion of distance. The key to the Euclidean norm is 
the triangle inequality 


(16.1.2) la—c| < ja—b| + |b-e| 


so that the distance from_a to c is no more than the distance from a to b plus 
the distance from b to cf For the p-adic norm, suppose that a — b = p*r and 
b—c = p‘s, where vp(r) = vp(s) = 0. Swapping a and c if necessary, we may 
suppose that k < @. Therefore 


a—c = (a—b)+(b—c) = p’r+p’s = p¥(r+p**s). 


The denominators of r and s are not divisible by p, and so neither is the denominator 
of p’s (since —k > 0) nor the denominator of r+p*~*s. Therefore v,(a—c) > k, 
which implies that 


(16.1.3) la—cl, < p* = max{la—dp, |b—elp}. 


This is quite a different triangle inequality from (16.1.2), sometimes called the ultra- 
metric triangle inequality; and a norm satisfying is called non-Archimedean, 
whereas those satisfying are Archimedean. Non-Archimedean norms arise 
naturally in this context, but the inequality has a lot of unintuitive conse- 
quences. For example, suppose that the p-adic distance |a — c|p, between a and c, 
is the largest of the three edges of the triangle with vertices a,b,c. If |a— |, is the 
second largest, then, by (16.1.3), we deduce that |a — c|, < |a— 6|,, and therefore 
these two longest edges must be equally long. This implies that every triangle is 
isosceles in the p-adic norm. 


Exercise 16.1.2.1 A circle in C takes the form B(a;r) := {x : |z—a| <r} where a is the center 
of the circle and r is its radius. We define a p-adic circle to be Bp(a;r) := {x: |x —alp < r}. 
(a) Prove that if b € B,(a;r), then Bp(a;r) = Bp(b;r). (In other words, every point inside a 
p-adic circle can be taken to be its center.) 
(b) Prove that if two circles Bp(a;r) and B,(A; R), with r < R, have a point b in common, 
then Bp(a;r) C B,(A; R). (That is, two p-adic circles are either disjoint or one is contained 
in the other.) 


16.2. p-adic expansions 


Rational numbers are easily defined from the axioms of integer arithmetic. Each real 
number is then defined as the limit of a convergent sequence of rational numbers; 
indeed one of the key uses of decimal notation is to make this concept intuitively 
obvious/] Thus for a real number r with decimal expansion 


T = Am/10™ + Gm41/10"*1 +--+ with each a; € {0,1,..., 9}, 


for some integer m, we define rp := Gm/10" + --- + az/10* € Q for each integer 
k > m. The sequence (r;) converges since it is a Cauchy sequence, as |r, — re| < 
10-* whenever £ > k, and it converges to the limit r, as |r — r;,| < 10~* for all 
k > m. Here the distance is measured by the Euclidean norm |.| = |.|,0; we call 
this process completing the rationals with respect to the Euclidean norm |.|... We 


1That is, the shortest distance between two points is a straight line. 
2By understanding what goes into that intuition we construct a technique that is useful in other 
contexts, as we see in this section. 


16.3. p-adic roots of polynomials 537 


now try to complete the rationals with respect to the p-adic norm |.|,, beginning 
by constructing an analogous expansion for 3/7 in the 5-adics: 


The fraction 3/7 = 4 (mod 5), and then 3/7 = 4 (mod 25). For the next power 
of 5 we have 3/7 = 54 (mod 125). Since 25 divides 125 we must have 54 = 3/7 = 4 
(mod 25), and indeed 54 = 4+2-57. Next we have 3/7 = 179 (mod 5*) and 
179 =4+42-5?+53. As we keep on going, through increasingly higher powers of 
5, we build a 5-adic expansion for 3/7: 


3/7 =44+2-5745944.5449.5543.5919.58 159 14.50 49.5 4g. 5lz4 
We can write this “in base 5” (without writing the powers of 5 each time) as 
(3/7)5 = 4021423021423021423021423021423021423.... 


Notice that this is eventually periodic (with period 021423), as are the p-adic ex- 
pansions of all rational numbers: 


Exercise 16.2.1. (a) Prove that the p-adic expansion of any non-zero rational number a/n 
with (a,n) = 1 is eventually periodic. 
(b) Show that if p{n, then the length of the period divides ord, (p). 


For any expansion of the form 
Q:= Omp™ + amp +--- 


with each a; € {0,1,2,...,p—1}, let a, =amp™ +--+ +axp* € Q for each k > m. 
Then |a—ax|, < p-*~1 and so a is the limit of the rational numbers a; measuring 
proximity using the p-adic norm. These expansions are the p-adic numbers, denoted 
by Q,. If m > 0, then these are the p-adic integers, denoted by Zp. Its elements 
can all be written in the form 


ao + ayp+ agp? +--- with each 0< a; <p—1. 
The rules of arithmetic for p-adic numbers and integers are straightforward enough. 


Exercise 16.2.2. Suppose that (rx)x>1 is any sequence of rationals such that for any € > 0, if k 
and @ are sufficiently large, then |r, — re|p < €. Prove that the rz, tend to a limit with a p-adic 
expansion. 


16.3. p-adic roots of polynomials 


Suppose that prime p does not divide a or n. Starting with the root b; (mod p) to 
x” =a (mod p), Proposition gives us a unique root by to 2” = a (mod p*) 
for every k > 1, where bj, = b; (mod p’) for all j,k > i. This implies that |b,—b;|, < 
p* whenever j,k > i, so that limz_,., by exists in the p-adic integers, by exercise 
This p-adic root, call it 8, satisfies 6” = a in the p-adics. Moreover, this 
implies that the roots 8 of x” = a in the p-adics are in 1-to-1 correspondence with 
the solutions b; (mod p) to x” = a (mod p). We can find p-adic roots of most 
equations: 


Theorem 16.1 (Hensel’s Lifting Lemma). Suppose that f(x) € Z[x] and that p 
is an odd prime. If a is an integer for which f(a) = 0 (mod p) and f'(a) # 0 
(mod p), then there is a unique p-adic root a to f(a) = 0 with a=a (mod p). On 
the other hand if a is a p-adic root of f(a) = 0 with |f'(a)|p = 1, then f(a) = 0 
(mod p) where a= a (mod p) with f'(a) #0 (mod p). 


538 16. The p-adic numbers 


This follows immediately from the following proposition, which generalizes the 
statement and proof of Proposition [7.20.1 


Proposition 16.3.1. Suppose that f(x) € Z[x] and that p is an odd prime. If 
f(a) = 0 (mod p) and f’(a) # 0 (mod p), then for each integer k there exists 
a unique residue class ay (mod p*) with a, = a (mod p) for which f(ax) = 0 
(mod p*). 


In Hensel’s Lifting Lemma we take a = limz_,.. ax in the p-adics. 


Proof. The Taylor expansion of the polynomial f(a) at « = a is simply the ex- 
pansion of f as a polynomial in x — a. If f has degree d, then 


(ca)? (o- a)? 


f(a) = fla) + f'(a)(@- a) + fF? (@) > dl 


We now proceed by induction on k > 2. Suppose that f(A) = 0 (mod p*) with 
A=a (mod p). Then f(A) =0 (mod p*~!) with A =a (mod p) and so A= ag_1 
(mod p*~') by the induction hypothesis. Writing A = a,_1+rp*~1 for some integer 
r, we use the Taylor expansion to deduce that 


0 = f(A) = f(ae-1+rp*') = f(an—-1) + f'(ax-1)rp** (mod p*), 
as p is odd. Hence r is determined by 
r= —f(an-1)/(f"(ae-1)p"*) = —(f (ae-1)/p**)/f/(a)_ (mod p), 


as p*—! divides f(a,—1) by the induction hypothesis, and f’(a,_1) = f(a) 4 0 
(mod p) as ax_; = a (mod p), so that f’(a) is invertible mod p. Hence r belongs 
to a unique residue class mod p, and therefore A is given uniquely (mod p*). 


d 
a ee OY 


Exercise 16.3.1. Show that if f(2) € Z[x] has no repeated roots, then there are only finitely 
many primes p for which there exists an integer ap with f(ap) = f’(ap) =0 (mod p). 


Exercise 16.3.2. Prove that if odd prime p does not divide a, then there are exactly 1 + (3) 
square roots of a (mod p). 


Exercise 16.3.3. (a) If prime p{ a, show that the sequence an = aP" converges in the p-adics. 
(b) Show that a := limn—+oo an is a (p— 1)st root of unity and that all solutions to x?~! —1 in 
Zp can be obtained in this way. 
(c) Conclude that i := limn—oo 25” isa square root of —1 in Qs. 


What are the square roots of 7 in the 3-adics? Let f(2) = 2? —7 and p = 3. 
For a = 1 and —1 we have f(a) = 0 (mod 3) but f’(a) 4 0 (mod 3), so we can 
apply Hensel’s Lemma. We can lift the root which is 1 (mod 3) as follows: 

We write a = 1+ 3k for some integer k. Then 7 = a? = (1+ 3k)? = 1+6k 
(mod 9) and so k = 1 (mod 3), and a = 4 (mod 9). 

We write a = 4+ 9 for some integer @. Then 7 = a? = (4+ 90)? = 16 — 9¢ 
(mod 27) and so €=1 (mod 3), and a= 13 (mod 27). 


We write a = 13 + 27m for some integer m. Then 7 = a? = (13 + 27m)? = 
7 — 27m (mod 81) and som =0 (mod 3), and a = 13 (mod 81). 


16.4. p-adic factors of a polynomial 539 


In summary we have established that if a? = 7 in the 3-adics with a = 1 
(mod 3), then a = 13 = 1-39 +1-3! 41-3? +0-3° (mod 34). We can continue 
in this way to calculate the digits of a, though we now introduce a more efficient 
root-finding technique: 


The Newton-Raphson method is an iterative technique to find a root of a poly- 
nomial f(a) € R[z]. The idea is to start with a first guess ap at a root of f(x) = 0 
and then to construct a sequence of increasingly good approximations a1, d2,... by 


taking 
i =. ae f(@n) 
n+l n f (an) ’ 


which is the correction suggested by the slope of f at a,. We can adapt this 
algorithm to lifting solutions modulo powers of p. 


Proposition 16.3.2. Suppose that f(x) € Za] and that p is an odd prime. If a 
is an integer for which f(a) =0 (mod p*) with k >1 and f'(a) £0 (mod p), then 
f(A) =0 (mod p**) with A= a (mod p*) where A=a— f(a)/f'(a). 


Proof. Let A = —f(a)/f'(a) so that A = a+ A, and v,(A) > k so that A =a 
(mod p*). If j > 2, then v,(A’/j!) > 2k, and the Taylor expansion gives 


f(A) = flat A) = fla) + AP (@) + | f(a) + 
= f(a)+Af’(a)=0 (mod p”*). 


The convergence here is much more rapid than in our previous algorithm, for if 
|ao|p < p+, then one can prove by induction that |a,| < p~?" for all n > 1. In our 
previous example, finding a square root of 7 in the 3-adics starting with ag = 1, we 
obtain, as f(a) = x? — 7, the transformation 


Re et ee 
ts Qa 2\" a)’ 
Therefore a; = 4; then ag = 23/8 = 13 (mod 81), and ag = 88/13 (mod 6561). 
We have a3 = 7 (mod 6561) as 88? — 7- 13? = 6561. Therefore if a? = 7 in the 


3-adics with a = 1 (mod 3), then we have determined that a = 88/13 (mod 3°), 
via an algorithm that converges far faster than the method we used above. 


16.4. p-adic factors of a polynomial 


Theorem 16.2. Suppose that f(x) € Z(p)[2] and can be factored as g1(%) +++ Gm(x) 
(mod p) where deg(f — 91-+- 9m) < ae ), and g;(x) and g;(x) have no common 
polynomial factor mod p, whenever i # j. Then, there exist unique polynomi- 
als Gi(a),...,G@m(x) € Zip)[x] with ai j(2) = a) (mod p) and deg(G; — gj) < 
deg(g;) for each j, for which f(a) = Gi(a)--+Gm(ax) in the p-adics. 


Proof. We will prove this with m = 2 and then the result for larger m follows 
by induction. So we write f(x) = g(x)h(x) (mod p) where g(a) and h(x) have no 
common polynomial factor mod p, and therefore there exist u(x), v(a#) € Z[a] for 
which g(x)u(x)+h(x)v(a) = 1 (mod p) with deg(u) < deg(h) and deg(v) < deg(g). 


540 16. The p-adic numbers 


We will prove, by induction, that for each k > 1, there exist unique polynomials 
gx (x), hy (a) (mod p*), with g,(x) = g(x) (mod p), hx(z) = h(x) (mod p), and 
f(x) = gx (x)hx(x) (mod p*) where deg(g— gx) < deg(g) and deg(h—hyz) < deg(h). 
The result then follows with f(#) = G(x)H(a) where G(x) and H(z) are the p-adic 
limits, as k > oo, for g,(a) and hyz(x), respectively. 

This is true by definition for k = 1, so now let k > 1. If f(a) = A(x)B(z) 
(mod p**1) with A(x) = g(x) (mod p), B(x) = h(x) (mod p) and deg(A — g) < 
deg(g), deg(B—h) < deg(h), then f(x) = A(x) B(x) (mod p*) and so A(x) = gx (x) 
(mod p*) and B(x) = hz(x) (mod p*) by the uniqueness of gz, and h;,. Therefore we 
can write A(x) = g,(x) + p*a(x) and B(x) = hy(x) + p*b(x) with deg(a) < deg(q) 
and deg(b) < deg(h). This implies that 


f(x) = A(x) B(a) = (ge (x) + p*a(x)) (ha (x) + p*0(2)) 

= ge (x)hy(a) + p*(a(a)he(x) + 0(x) 9x (x)) 

= g(x) he (x) + p*(a(a)h(a) + b(x)9(x)) (mod p***). 
Let Ag(x) = (f(x)—gr(x)he(x))/p* € Z(p)[x]. We select ay(x) to be the polynomial 
of minimal degree with 

an(x) = v(x)Ag(@) (mod (p, g(@))) 

so that deg(a;,) < deg(g). Writing a, = vA, + ¢(x)g(a) (mod p) for some &(x) € 
Z[x], we let by (a) = u(x) A; (x) — £(x)h(x), so that 
ay (x)h(av) +b, (x) g(x) = (VAnt+lg)h+ (uA —Lh)g = (ug+vh)A, = Ax (mod p). 


Therefore there must exist a solution to f(a) = A(x) B(x) (mod p*t?) satisfying all 
the desired hypotheses. It remains to show that this solution is unique: If we have 
any other solution, then ah + bg = axh+byg (mod p) so that (a—axz)h = (by — b)g 
(mod p). Since g and h have no common factor we deduce that g divides a—a, and 
h divides b—b;, and so deg(a) > deg(g) and deg(b) > deg(h), which are impossible. 
The result therefore follows with gpi1 = gx + p*ax and hyy1 = hy + p* dp. 


Factoring polynomials in Z|] efficiently 


We wish to factor a given polynomial f(x) € Z[a]. It is not difficult to find any 
factors that appear to a power > 1 by calculating (f, f’), so we may assume that 
f(x) has no repeated factors. We will need a bound on the size of the coefficients 
of any potential factor g(a) of f(a). To do this we need to measure the “size” of a 
given polynomial 


d d 
=e =a][( x—a,;) € Cz}. 


i=0 j=l 
Two ways to do this are given by the 2-norm ||f||, a non-negative real number 
which is defined by 


d 
fl? = S- |fil?, and the Mahler measure M(f) = |a| II max{1, |a,|}. 


i=0 j=l 


16.5. Possible norms on the rationals 541 


We need to compare these two: 
Exercise 16.4.1. Assume g(a) € C{a] and a € C with a £0. 
(a) Prove that ||(w — a)g(x)|| = ||(@x — 1)g(x)|| and M((a — a)g(x)) = M((ax 1)g(a)). 
(b) If ja] < 1 whenever f(a) = 0, prove that M(f) < ||f|| < M(f) (2 =O (")) } , where f 
has degree n. (We note that 779 (77 y= es) < 22") 
(c) Deduce that if f has degree n, then M(f) < < ||f|| < 2" M(f). 
) 


Suppose that g(x) € Z[{x] divides f(x) € Z[a]. Prove that M(g) < M(f). 
(e) Deduce that if g has degree d, then ||g|| < 2%||f]]. 


Here is the Berlekamp-Zassenhaus algorithm for factoring a given squarefree 
polynomial f(a) € Z[a] of degree n: 

e Calculate A(f), the discriminant of f(x), by using the Euclidean algorithm 
over Z[x]. This is a non-zero integer as f(a) has no repeated factors (as proved 
in sections and of appendix 2B). 

Select a prime p which does not divide A(f). Therefore f(x) (mod p) is 
squarefree, by Corollary [3.23.1] 

Use the algorithm of section [10.21] (of appendix 10H) to efficiently factor f(z) 
(mod p) into irreducibles. 


e Select k to be the smallest integer for which p* > 2”|| f\]. 


e By Theorem and its proof, we can efficiently lift the factors of f(x) 
(mod p), uniquely, to a factorization of f(x) = g1(x)---gm(a) (mod p*). 


For each SC {1,...,m} let gs(x) = J] jeg 9i(a) (mod p*), a polynomial with 
integer coefficients for which each coefficient lies in as =, 
e Test whether each gg(x) is a factor of f(a) in Z[z]. 


This gives all the factors of f(a). For if g(a) properly divides f(a), then exercise 


16.4.1(e) implies that each coefficient g; of g satisfies |g;| < 2”~1|| fl] < ve as 
d<n-—1. Moreover if we reduce the equality f(x) = g(x)h(x), modulo p*, then we 
see that there must exist some proper subset S' of {1,...,m} for which g(x) = gs(x) 


(mod p*). The coefficients of their difference is < p* and so g(x) = gs(z). 


Further reading on factoring polynomials 


[1] Susan Landau, Factoring polynomials quickly, Notices Amer. Math. Soc. 34 (1987), 3-8. 


16.5. Possible norms on the rationals 


A norm on Q is a map |.| : Q > R for which |a| > 0 for all & € R, with |z| = 0 if 
and only if# =0; |xy| = |2|-|y|; and |x + yl < |2| + yl. 


Exercise 16.5.1. Prove that |1| = |— 1] =1. 


We have defined the Euclidean norm |.|.. and the p-adic norm |.|,. We now 
prove that, up to taking powers, these are the only possibilities: 
Theorem 16.3 (Ostrowski’s Theorem). A norm on the rational numbers is either 
|.|& for some constant k, O<«K <1, or Ee for some prime p, for some constant 
K > 0. 


542 16. The p-adic numbers 


Proof. Suppose that |m| > 1 for some integer m > 1. Fix any other integer b > 1, 
and then define B := max{|c|: 0 <c< b—1}. Any integer N can be written in 
base b as mp +m 1b +--+: + mab where each m; € {0,1,2,...,6—1} and mg > 1. 
Therefore 


d d 
IN| < So |mibi| = So |milldl’ < (d+1) B max{1, [6/4}, 
i=0 1=0 


as there are d+ 1 terms in the sum, each |m;| < B, and each |b|' < 1 if |b] < 1, 
with each |b|’ < |b|¢ if |b] > 1. 
We let N =m" for any integer n > 1, so that |N| = |m|". Now 64 < N and so 


d< Te 4 = nips ~. Substituting this into the inequality above gives 


logm 
log b 


Im|" < (140 ) B max{1, |b|” 5° }. 


We will take nth roots of both sides and let n —> 00. We notice that (ut+-nv)'/" > 1 
as n — oo for any v > 0 and so deduce that 

lm] < max{1, |b] 5" }. 
Since |m| > 1 this implies that |b] > 1 and therefore, taking logarithms, we have 


log|m| — log |b| 


logm ~— logb~ 


Since |b| > 1 we may run the same argument with the roles of b and m reversed and 
obtain the opposite inequality, and combining these we get equality. But this holds 
for all integers b > 1. Hence there exists a number « for which |b] = b" = |b|&, for 
all integers b > 1. 

As |m| > 1 we deduce that « > 0. Since 2" = |2|&, < |1/%,+]1|£, = 2, therefore 
& <1. One can show that the triangle inequality holds whenever 0 < Kk < 1. 

If |n| = 1 for all n > 1, then |n| = |n|&. 

We may now assume that |n| < 1 for all integers n > 1, and that |m| < 1 for 
some m > 1. By multiplicativity we know that |p| < 1 for some prime p dividing 
m. There can be no other prime q with |gq| < 1, or else we select a large power 
e, so large that |q|© < 1— |p|. Since (p,q) = 1 there exist integers a,b for which 
ap + bq® = 1 and therefore 


1 = |1| = lap+bg*| < jap|+ lbg*| = lallpl + lellal < |p] + lal? <4, 
a contradiction. Therefore, if |p| = p~" = |p|>, then |.| = |.|>. Taking powers of 
(16.1.3) we see that any norm |.|* satisfies the ultrametric triangle inequality and 
therefore satisfies the Euclidean triangle inequality. 


16.6. Power series convergence and the p-adic logarithm 


Theorem 16.4. Let (@n)n>0 be an infinite sequence of p-adic numbers. The infinite 
sum do ns0 An converges to some number L (in the p-adics) if and only if an — 0 
as — oo. 


16.6. Power series convergence and the p-adic logarithm 543 


Proof. Suppose that a, — 0 as n — oo. Fix € > 0 and Mp so that |a,| < ¢€ 
whenever n > Mo. Using the ultrametric inequality we have that if N > M > Mo, 
then 


S- On S- Gn) < nar, lanlp LE. 
= Pp 
Therefore the partial sum of the a, form a Cauchy sequence, and therefore 7,9 an 
converges to some number L. 
On the other hand if }°,,.) an converges to some number L, then for any € > 0 
there exists M, such that | 3°, <,,@n — L| < € whenever M > M;. Therefore if 
N > M, +1, then we have ay = (SD ,<y Qn — L) — (Se n_1 Gn — L) 80 that 


lan |p < max So an - L : S- Qn — L <e. 


n<N n<N-1 
7 Pp ~ Pp 


We deduce that a, > 0 as n > oo. 


Exercise 16.6.1. Let p be a given prime. 
(a) Prove that }7,59 2"/an converges when |z|p < p~, where 7 = limsup,,_,.. Up(@n)/n. 
(b) Deduce that 5>,,., 2"/n converges if |z|) < 1. (In C this also converges inside |z| < 1.) 


Exercise [3.10.1[b) states that vp(n!) = na setn) where s,(n) is the sum of the digits of n 
when written in base p. 
(c) Deduce that >, 2"/n! converges if |z|p < p7/@-, 


We define the p-adic logarithm to be 


log, (2) = — ye ie =. 


n>1 


whenever |1 — z|, < 1 with z € Z, (this sum converges by exercise [16.6.1\b)). 


Similarly we define the p-adic exponential to be 
n 


exp, (z) := - — 


n>1 
wherever |z|, < p-!/@- (this sum converges by exercise [I6.6.1{c)). 


Exercise 16.6.2. Suppose that |1 — alp,|1— dlp < p71/@-Y. 
(a) Prove that |1 — ablp < p71/@-)), 
(b) Deduce that exp, (ab) = exp,,(a) exp, (0). 


In exercise b) we saw that +(") = (—1)/~"/j (mod p) forl <j <p—1, 
so that if z € Zip), then 


21S ("i 2)" =—=((1L- 2)? -1+ 2") (mod p). 


p— 


1 
me Pp 


This suggests that there might be a convenient expression of this type for log, (2). 


Exercise 16.6.3. Suppose that up(x) > 0. 
(a) Prove that if p* <m < p*+!, then up(x™/m) > p*vp(x) — k, for each integer k > 0. 
(b) Suppose that vp(x) > 2r/p" for some integer r > 1. Deduce that vp(a"/m) > k for all 
m > p*, whenever k > r. 


544 16. The p-adic numbers 


Lemma 16.6.1. For any z € Zip) for which |1 — z|p < 1, we have 


Proof. Let x = 1— z so that v,(x) > 0. We can select an integer r > 1 for 
which v,(z) > 2r/p", as i/p’ > 0 as i > oo. Let k be any integer > 3r and 
let @ be the largest integer < k/3, so that € > r. Therefore if m > p‘t!, then 
Up(a™/m) > £+1> k/3 by exercise [16.6.3{b). 


For 1 < m < p* we have 


Lo" "7 pr-j 1c "T pt 
cyrs(? =(-1)""! II j = = where Cm, := II (1 — | : 
j=l 


get 


We let cm = 0 for all m > p*, so that —log,(z) = —log, (1 — x) = 


k k p 
1— 2 1-—(1-«2)? pemerie 
—_ =deor 5 


Therefore 


Now ifl1<j<m<p*, then 1— © € Zp), and so 1—cm € Zip) for all m > 1; 
that is, |1 — cm|p < 1. Therefore if m > p**, then 


gm m 
a Ga)—| mae 
Wp TO Nig 
by the first paragraph. 
If 1 <j <m< p!, then 1—- oe = 1 (mod p*—®), and so |1 — em| < p*-*. 


£ and so 


Now |m|, > p— 
| m 


(1- om) 


P 
Combining these last two estimates, we deduce that 


m 


x 
=|S"(l-em)—]| < 
( Cm) — < max 


m>1 


1-2?" 


log, (z) + et aoe 


(1 = mn) — 


m>1 
7 = Pp 


and the result follows, letting k — oo. 


Exercise 16.6.4. Prove that >>,,,., 2°"/m = 0 in the 2-adics. 


Exercise 16.6.5. Assume that |a — 1|p,|b—1|p <1. 
(a) Prove that limp—4oo aP* =1. 
(b) Deduce that log, (ab) = log, (a) + log, (b). 
(c) Deduce that if a = b”, then log, (a) = nlog,,(b). 
(d)t Suggest an algorithm for the discrete log problem in the p-adics. 


16.7. The p-adic dilogarithm 545 


At the moment the function log,(z) is defined only when |z — 1|, < 1. For 
any 6 € Z, with |6|, = 1, there exists an integer b with b = 8 (mod p) and b #0 
(mod p). Therefore, by Fermat’s Little Theorem, 8?~! = b?-' = 1 (mod p), and 
so log, (6—*) is well-defined. Taking our lead from exercise [16.6.5{c) we therefore 
define 


log, (3?) peo-0 — 4 


lo = —- oS oh ee 
&p(8) p-1 pe p¥(p—1) 
Exercise 16.6.6. Assume that a, 8 € Zp. 
(a) Prove that log,(—a) = log, (a). 
(b) Prove that log,(a8) = log, (a) + log, (8). 
Any y € Z, can be written in the form 7 = p°8 where |6|, = 1, so we defind}] 


log, (y) = elog,(p) + log, (8). 


16.7. The p-adic dilogarithm 


For each k > 1, define 
m 
Ly (a) = 4 
(2) > m* 
m>1 
The case k = 2 is the dilogarithm function. 
Exercise 16.7.1. (a) Prove that the sum defining £;,(x) converges for all x € C with |2|o. <1 
for all k > 2, and for |x|) < 1 in the p-adics. 
(b) Establish that £y(x) + L_(—x) = 21-*£L;(x?) when |2|p < 1. 
Theorem 16.5. /f|1—z|, <1, then 
= 1 
(16.7.1) £.(1—2z)+£,(1-—27') = — 5 (log, ae 
In particular £2(2) = 0 in the 2-adics. 
Proof. For |x|, < 1, we have 


dLo(x) _ 1 = a log, (1 — z) 


dx m x 
m>1 
and so, by the chain rule, we have 
d 
qy E20 —z)+£o(1-—271)) = -£4(1-—z) +27 2£3(1- 277) 
__ log, (z) _-2losp(27") _ log, z 
~ “qag tae ~~ z 


Integrating yields £2(1— z) + Lo(1— 27+) = —5 (log, z)? + C for some constant C. 
Taking z = 1 we see that C = 0, yielding (16.7.1). 


Replacing z by z?, we obtain 
Lo(1— 27) + £2,(1—277) = —2(log, z)? = 4(£2(1— z)+£2(1- 27°). 
When p = 2 we may take z = —1 in this equation and so 8£2(2) = 2£2(0) = 0. 


°We can select any value for log, (p) as is convenient in context; we do not have to let it be 1. 


546 16. The p-adic numbers 


Exercise 16.7.2. Let p = 2 and |z — 1|2 <1. 
(a) Prove that £2(1— z)+ £2(14+ 2) = $Lo(1 2?) +C for some constant C. 
(b) Prove that C = 0 using (16.7.1). 
(c) Deduce (again) that £2(2) = 0. 


We have now seen that 


in the 2-adics. It is interesting to see how rapidly this convergence happens. If 
n>N > 2*, then vo(2"/n) > 2* —k so that 


> ~ =-\> ~ =0 (mod 2?~*) 


n<N n>N 


and similarly 


2 kb 
be “3 =0 (mod ge) 
n<N 
It looks like there might be a pattern here. How about >. 2"/n?? Unfortunately 
the n = 4 term gives the unique maximum, 2?, of |2”/n*|2, and so | >, 2”/n?|2 = 
4, not 0. ~ 


Exercise 16.7.3. Prove that if |z|p,|y|p < 1, then 
a(1— 1—2# L=2 A 
£a(a) + £a(y) — Lo(ay) ~ £2 (S22) — cy (HD) — og, (=) tog, (4). 
l—«zy 1—«ay l—«ay l—«ay 


Further reading on p-adics 


[1] Richard M. Hill, Introduction to number theory, chapter 4, World Scientific, Singapore, 2018. 


ig HHH | 
Chapter 17 


Rational points on elliptic 
curves 


Any equation of degree 3 in two variables with a rational solution can be trans- 
formed, via an invertible transformation with rational coefficients, to an equation 
of the form 

E: y=2>+ar4+5. 
We call the curve EF, given by such an equation, an elliptic curve. When one draws 
the real points (x, y) on this curve, denoted by F(R), there are two possible shapes, 
depending on the number of real zeros of x? + ax + b: 


4 y=2+1 2 y=2—z 


> @ 


CD 


‘a : 
a 


Figure 17.1. Elliptic curves with one and three real zeros, respectively. 


There can be no real points with an 2-coordinate for which x? + axz+b < 0, in 
particular, to the left of the graph. 


Exercise 17.0.1. Let A = 4a® + 27b?. Show that if a > 0 or if A > 0, then #? + az +b=0 has 
just one real root. Show that if a, A < 0, then x? + ax + b = 0 has three real roots. Sketch the 
shape of the curve y? = x? + ax + b in the two cases. 


547 


548 17. Rational points on elliptic curves 


17.1. The group of rational points on an elliptic curve 


The rational solutions of the original equation of degree 3 in two variables are in 
1-to-1 correspondence with the rational solutions of y? = 2° + ax +b, as long as we 
include the point(s) at ool] If the original equation had rational coefficients, then 
a and b are integers. We are interested in understanding the rational points on FE, 
that is, those (x,y) on E with x,y € Q, which we denote by E(Q). 

Suppose that we are given two distinct points P; = (x1, y1) and P2 = (x2, y2) € 
E(Q). The line between them is either vertical (when x, = x2) or of the form 
y=mat+yv with mv € of These two points are both intersections of the line 
with the elliptic curve y? = x? +ax-+b. If the line is y = ma+yv, then 21, x2 satisfy 

(ma+vyP=y=22 +ar+b; 
in other words 2; and 22 are two of the three roots of the cubic polynomial 
a? — mx” + (a—2mv)az + (b—v?) =0. 
If the third root is 73, then 23 = m? — x, — x2 € Q, and if y3 = ma3 4+ v, then 
P3 = (x3, y3) € E(Q) is the third intersection of the line with E. This method for 
generating a third rational point on EF from two given ones goes back to Fermat|}| 


> 


Figure 17.2. Obtaining a new rational point from two old ones: Draw the 
line through P; and P2. The third point of intersection is P3. If P; and P» 
have rational coordinates, then so does P3. 


If v, = x2, then yo = —y,, and the vertical line ( = x,) between P, and 
P, appears to meet the curve at only these two points. Is there a third point 


1This same issue occurs in linear transformations of all equations, even of degree 1. For example, 
the rational points (x,y) on x + 2y = 3 are in 1-to-1 correspondence with the rational points (u,v) 
on u+ 2 = 3v, via the invertible transformation u = x/y,v = 1/y. However there is a problem with 
the point (x, y) = (3,0) as this transforms to (u,v) = (oo, 00); and similarly the point (u,v) = (—2, 0) 
transforms back to (x,y) = (00,00). To take these properly into account we can work with integer 
solutions to the equation a+ 2b = 3c, with a, b,c not all 0. These are called the projective coordinates, 
and solutions for which the ratios a: b: c are the same are considered to be equivalent. (In the original 
two affine equations, the solutions are given by x = a/c, y = b/c, and u = a/b, v = c/b.) 

?In section we saw that two rational points on the unit circle give rise to a line with rational 
coefficients and vice versa; this allowed us to find all the rational points on the unit circle. 

3 And compare this with the method for parameterizing rational points on a quadratic curve given, 
for example, by exercise[6.1.2 


17.1. The group of rational points on an elliptic curve 549 


of intersection? There is no other point of intersection on the graph (that is, on 
the real plane), but the line stretches to infinity, and indeed the third point is, 
rather surprisingly, the point at infinity, which we denote by O. The best way 
to see this is to rewrite the curve in projective coordinates: That is, we change 
(x,y) > (x/z,y/z) and multiply the equation of the curve through by z° to obtain 
y?z = x + axz? + bz3. The point at infinity, O, is (0,1,0), the only point with 
z = 0, up to scalar multiplication of all three coordinates. We see that it lies on 
the (projective) line x = x12. 

One can do even better and generate a second point from a given one: If P, = 
(71,41) € E(Q), let y = mx+v be the equation of the tangent line to y? = 23 +ar+b 
at (21,41). To calculate this, we simply differentiate to obtain 2y;m = 327 +a and 
then v = y; — ma,. In this case, our cubic polynomial has a double root at « = x 
and we again compute a third point by taking 73 = m? — 2x1, y3 = mx3 + Vv so 
that P3 = (x3, ys) € E(Q). 


Figure 17.3. Obtaining a new rational point from an old one: Draw the 
tangent line through P,;. The other point of intersection is P3. If P; has 
rational coordinates, then so does P3. 


Exercise 17.1.1. Prove that there cannot be four points of E(Q) on the same line. 


Poincaré made an extraordinary observation: If we take any three points P,Q, R 
of FE on the same line, then we can define a group by taking 


P+Q+R=0. 


This implies that in the first example above, (x1, y1) + (@2, y2) + (73, y3) = 0. The 
line at infinity tells us that O + O + O = O, and therefore the point O is the zero 
of this group. The group equation becomes 


P+Q+R=0. 


550 17. Rational points on elliptic curves 


Moreover we have seen that (a, y) + (w,—y) + O = O which implies that 
(x,y) + (a, —y) =O, and therefore if P = (x,y), then — P = (x, —y). 
Returning to the first example above, this implies that 


(21,41) + (@2, ya) = (73, —ys)- 


> 


=P =P + Pe 


Figure 17.4. The group law: Adding two points P;, and P2 to obtain, geo- 
metrically, Pi + Po. 


The addition operation is evidently closed; that is, adding points in F(R), 
including ©, yields another point in F(R). Similarly, addition is closed within the 
subgroup E(Q). It is complicated to establish that addition is associative, which 
we must do to prove that we indeed have a group (see for details). It is also 
obvious that the addition law is commutative. 


Exercise 17.1.2. In Figure 17.3 the polynomial x? + az + b has three real roots, r1 < r2 < 3, 
and so the points of H(IR) come in two continuous components, the egg and the infinite part 
E+(R) ={(2,y) € E(R): «> rg} {O}. 

(a) Prove that a straight line intersects the egg in exactly 0 or 2 points (counted with multi- 

plicity). 

(b) Prove that the tangent at any point P € E(R) hits E again in E*(R). 
(c) Deduce that E*(R) is a subgroup of E(R). 
(d) Deduce that the egg is the coset (r2,0) + E+(R) (that is, a coset of E*+(R) in E(R)). 


Suppose that we have a point P € E(R). Take the tangent, find the third 
point of intersection of the tangent line with E to obtain —2P, and then reflect 
in the x-axis to obtain 2P. Fermat suggested that if we repeat this process over 
and over again, then we are unlikely to come back again to the same point. If we 
never return to the same point, then we say that P has infinite order; otherwise P 
has finite order, the order being the minimum positive integer n for which nP = 0 
(points of finite order are known as torsion points). 

Exercise 17.1.3. Prove that if 2P = O, then either P = O or P = (x,0) where x? + ax +b=0. 


Deduce that the number of rational points of order 1 or 2 is one plus the number of integer roots 
of «° + ax + 6 and therefore equals 1, 2, or 4. 


Exercise 17.1.4. Prove that the torsion points form a subgroup of E(C). 


17.2. Congruent number curves 551 


Exercise 17.1.5. Suppose that «* + ax + b has three real roots. 
(a) Prove that if P € E(R) is a torsion point of odd order, then P € E+(R). 
(b) Prove that if a torsion point P of F lies on the egg, then it is one of the points of order 2 
at either end of the egg. 


So far this discussion has largely been geometric, focusing on the real points of 
E. Number theorists wish to identify the structure of the group of rational points, 
E(Q). Is E(Q) finite or infinite? Mordell’s Theorem states that E(Q) is finitely 
generated, and so it can be written in the form T 6 Z” for some integer r > 0, 
where T is the set of points of finite order (the torsion points) and is finite. In 
other words, there exist a set of points P,,...,P, of infinite order, such that 


E(Q) = {t+a,P,+---+a,P,: t€T and each a; € Z;}. 


Each of these points are distinct. We say that F(Q) has rank r, though strictly 
speaking we are only referring to the rank of the quotient group E(Q)/T. 


Exercise 17.1.6. Deduce from Mordell’s Theorem that E(Q) is finite if and only if its rank r = 0. 


We have already seen this group structure, with r = 1, when we were consid- 
ering solutions to Pell’s equation in section[[1.2] There, all solutions take the form 
+e4 with a € Z, so that the group of units of Q(Vd), when d > 0, is generated by 
—1 and eq and has structure Z/2Z x Z, the +1 being torsion. There can be more 
torsion in the group of units of Q(V/d) than just +1, when d > 0: For example, in 
Ql] we also have the units +7 so the unit group structure is Z/4Z, generated by i. 


Lutz and Nagell showed that if (a, y) is a torsion point in E(Q), then x and y are 
integers, and y? divides 4a? + 27b?, the discriminant of x? -+ax+b. Therefore there 
are only finitely many rational torsion points. Mazur improved this by showing 
that the torsion subgroup of E(Q) contains at most 16 points: It either has the 
structure Z/NZ for some 1 < N < 10 or N = 12, or the structure Z/2Z x Z/2NZ 
for some 1< N <4. 


The proof of Mordell’s Theorem is the main focus of the second part of our 
book but is a bit too complicated to develop here. Instead, to get some 
idea of how we prove Mordell’s Theorem, we focus on a very special class of elliptic 
curves: 


17.2. Congruent number curves 


A Pythagorean triangle is a right-angled, rational-sided triangle. Is there a Pythag- 
orean triangle with area equal to a given rational number A? In section [6.1] we 
saw that Pythagorean triangles can be parametrized by edge lengths g(t? — 1), 2gt, 
g(t? +1) where g,t € Q. (We also need that t > 1 and g > 0 for these lengths to 
be positive.) This triangle has area $ - g(t? — 1) - 2gt = g?(t® — t), which equals A 
if (At, A?/g) is a rational point on the elliptic curve: 


Eaty =a — A’a. 


Moreover any rational point (7, y) on this curve, with x > A (so that (x, y) € EX (Q) 
and y > 0, generates a Pythagorean triangle of area A by taking t = 7/A,g = A?/y. 


Exercise 17.2.1. Prove that if A = ar? for some rational r, then Ea(Q) is isomorphic to E,(Q). 
(We may therefore restrict our attention to E4 where A is a squarefree positive integer.) 


552 17. Rational points on elliptic curves 


The elliptic curves FE, are called congruent number curves. They each have 
three points, (—A,0), (0,0), (A, 0), of order two. 


Exercise 17.2.2. Suppose that P € F.4(Q) has order > 2. Prove that exactly one of points P 
and P + (0,0) belongs to Et (Q). Therefore this point yields a Pythagorean triangle of area A, 
with parameters t = x/A,g = A?/|y|). 


Exercise [17.2.2] implies that the set of Pythagorean triangles of area A has a 
structure that can be best understood via the rational points of E4. 


Theorem 17.1. Let A be a squarefree, positive integer. If there is one Pythagorean 
triangle of area A, then there are infinitely many. Starting with one of area A with 
parameter t > 1, we obtain infinitely many by iterating the map 


Proposition 17.2.1. Let A be a squarefree, positive integer. If (x,y) € Ea(Q), 
then we may write x and y as x = m/n? and y = ¢/n® where (2 = m(m? — A?n*) 
with (¢m,n) =1 andn> 0. If P = (a, y) € Ea(Q) with y £0, then 2P = (X,Y) 


where 
x2 + A2 2 m2 + A2n4 2 
xX = —_—_ — EEO 
Ce a aoe 
This also gives 


2 2\ 2 2 2\ 2 

= = 2Ar—A 

rq a=(7 ane =) and x+a- (See cee ) ; 
2y 2y 


so that 2P € E}(Q). We may write X = M/N? where (M, AN) =1 and N >n. 


Given a Pythagorean triangle with parameter t = m/An?, this proposition 


2 
yields another Pythagorean triangle of area A, this one with parameter T = oe. 


Proof. If (#,y) € E4(Q), then we may write x and y as x = m/u and y = ¢/v 
where (m,u) = (€,v) = 1 and u,v > 0. Then 


wl? = u3(vy)? =v? - uby? = v? - u3 (2? + ax +b) = v2 (m3 + amu? 4 bu?). 
Now (u,m° + amu? + bu?) = (u,m?) = 1 as (u,m) = 1, so that, since u? divides 
v?(m? +amu? +bu") we deduce that u? divides v? by Euclid’s lemma. An analogous 


argument reveals that v? divides u?, and so v? = u°, since they are both positive. 


This implies that there exists a positive integer n for which v = n° and u = n?, 


and the first claim follows. 


The second claim, that X, X — A, and X + A are all squares, is simply a 
calculation. We can write X = M/N? with (M,N) = 1 just as we did with x. We 
need to prove that (M, A) = 1. If not, then there exists a prime p dividing M and 
A, so p divides M = XN? and M — AN? = (X — A)N?, which are each squares. 
Therefore p? divides them both and so their difference AN?. However p cannot 
divide N as (p, N) = 1 and so p? divides A, which contradicts the fact that A is 
squarefree. 


From our formula for X we have N = 2én/g where g = (2¢n,m? + A?n*). Now 
(n,m? + A?n*) = (n,m?) = 1, and so g divides 2¢. Therefore N = (20/g)n > n. 


17.4. The group of rational points of y? = x3 — x 553 


Exercise 17.2.3. Let xp = m/n? > A and x2p = M/N? with (m,n) = (M,N) =1. 
(a) Prove that M < 4m4. 

) Prove that if (m, A) = 1, then (m? + A?n4,2én) = 1 or 2, and N is even. 

(c) Deduce that if (m, A) = 1, then M > m*4/4, and also M > m* if n is even. 

) Deduce that, in general, (m? + A?n*,2@n) divides 2(m, A)?. 
) 


Proof of Theorem [17.1] A Pythagorean triangle of area A leads to a rational 
point on P = (xp, yp) € E,(Q) with zp > A. We will write rx p = mz/ (nx)? for 
all k > 0. By Proposition [7.2.1] we can write rp = At where t := mo/Anj > 1. 
Moreover tgp = AT where T := (t?+1)?/4(t3—t) = m1 /An? > 1 with (m1, Ani) = 
1 and n, > no. 

We deduce from Proposition [17.2.1] by induction on each k > 1, that roxp > A 
with (mz, Ang) = 1, and so mz41 > mée/4 by exercise [7.2.3] Therefore the mx 
are increasing, and so the 2" P are distinct. We can translate each such point, via 
exercise to obtain an infinite sequence of distinct Pythagorean triangles of 
area A. 


Exercise 17.2.4. Prove that the torsion subgroup of F.4(Q) is isomorphic to Z/2Z x Z/2Z. 


Our goal now is to understand the points of infinite order on congruent number 
curves. We begin though with an easier example. 


17.3. No non-trivial rational points by descent 


We can show that there are some elliptic curves which have no rational points by an 
easy descent: Suppose that x,y € Q such that y? = x? + 2, so that « = m/n?,y = 
£/n® with (m,n) = 1 and 
22 = m(m? +n). 

Now (m,m? + n*) = (m,n*) = 1 so that both m and m? + n* are squares, say, 
m =u? and m? + n4 = w?. Therefore 

n*+u4 =m +n4 = yw’, 
which has no non-trivial solutions, by Theorem The trivial solutions have 
either n = 0 (corresponding to the point O at oo) or u = 0 (corresponding to the 
point (0,0) of order two). Hence E(Q) = Z/2Z. 
17.4. The group of rational points of y? = x° — x 
We apply ideas from elementary number theory to rational points on the elliptic 
curve 

By y?=2'-g. 

There are several obvious rational points: the point at infinity, as well as the three 
points of order two, (—1,0), (0,0), (1,0). Are there any others? We can write 
r=m/n?, y = C/n3 with (€m,n) = 1 to obtain 

(m—n?)m(m +n?) = £2. 
Here, the product of three integers equals a square, so we can write each of them 
as a squarefree integer times a square, where the product of the three squarefree 
integers must also equal a square. Note that the three squarefree integers cannot 


554 17. Rational points on elliptic curves 


have a common factor, or else that factor cubed is a square and so the integers were 
not squarefree. This means that we can write 


2 2 2 2 2 
m—-n =pqu, m=prv°*, m+n* = qru’, 


for some squarefree integers p,q,r which are pairwise coprime. Moreover since the 
product of the integers is positive, and as m—n? < m < m+n?, we see that m+n? > 
0 and m — n? and m have the same sign. Hence we may assume that p has the 
same sign as m, and q and r are positive. Now note that (m+n?,m) = (n?,m)=1 
and so |p| = r = 1. Moreover (m — n?,m +n?) = (m—n?,2n?) = (m—n?,2) since 
(m — n?,n?) = (m,n?) = 1. To summarize, we have proved that there are four 
possibilities for the value of (p,q,7r), since p = —1 or 1, g=1 or 2, andr = 1. This 


leads to four possible sets of equations: 


m—-n? = uw? | —u? | Qu? | —2u? 
m = vy? | —v? |} v? —v? 
mtn? = w2] w? | 2w? | 2w? 


Table 17.1. One of these cases must hold, for some integers u, v, and w. 


The solutions with (m,n) = (1,0), (0,1), (1,1), (—1, 1), obtained from the four 
rational points O, (0,0), (1,0), (—1,0), correspond to these four cases, respectively. 


Suppose we have another rational point on the curve, say, P = (m/n?, ¢/n3). 
Let Q = P+0,P+(0,0),P+(1,0), or P+(—1,0), according to which of the four 
ways that (m—1n?)m(m-+n?) factors. Writing Q = (M/N?,L/N®) we find that Q 
always belongs to the first class; that is, each of M—N?, M, and M+N? are squares. 
For example, the line between P and (0,0) is mny = éx. If P+(0,0) = (u,v), then 
utm/n?+0 = (€/mn)?, so that u = (€?—m3)/m?n? = —n?/m since 2 = m3—mn*. 
Hence we see that m and n? in the second equation must be replaced by M = n? 
and N? = —m, respectively, so that M—N? =n?+m=w?, M=n?, M+ N? = 
n? —m =u’, respectively, as claimed. 

Therefore any rational point leads to a solution in the first case, and from there 
to an integer solution to the equation 


X*4-y*=2" with (X,Y)=1 and X +Y odd, 


taking (X,Y, Z) = (v,n, uw). But this was shown to have no non-trivial solutions in 
Theorem [6.2] by constructing a smaller solution from any given solution. Therefore 


E(Q) = {O, (0, 0), (1, 0), (=1,0)}- 


This approach generalizes to all elliptic curves, though we will focus on the 
congruent number curves. 


17.5. Mordell’s Theorem: F4(Q) is finitely generated 


We now indicate how to generalize the above proof to rational points P € E4(Q), for 
arbitrary squarefree, positive integers A. If P 4 O, then we can write P = (xp, yp) 
with xp = m/n? where (m,n) = 1, as we saw in Proposition [7.2.1] and such that 


(m — An?)\m(m + An?) = 


17.5. Mordell’s Theorem: E4(Q) is finitely generated 555 


for some integer @. As before we find that (m,m— An?) = (m,m-+ An?) = (m, A) 
divides A, and (m — An?,m + An?) divides 2A. Therefore we can write 


(17.5.1) m— An? = pqu?, m=prv?, m+ An? = qru’, 


for some squarefree integers p,q,r which are pairwise coprime, with q,r > 0. Since 
pqr divides 2A, there are only finitely many possible such triples. 
There were two key steps of the argument in the previous section that will need 
to be generalized: 
e For the “smallest” Py € E4(Q) with the same p, q,r-values as P, show that if 
P+Po =Q, then xg, tg — A, and xg + A are all squares. 
e If Q € E4(Q) with p= q =r =1, then construct a smaller point of E4(Q) 
by a descent argument. 


The technical setup for the first step follows from the explicit formulas for 
adding points: 


Proposition 17.5.1. If P,Q, R=P+Q€é E,(Q), with P,Q,R#O, for which 
eptg, (tp—A)(tqg—A), and («p+ A)(tqt+ A) 


are each squares, then xp, tg — A, and xR+A are all squares. 


Proof. This holds when P = Q by Proposition [17.2.1] Now assume that P 4 Q. 
We have 


2 2 
= 1 a 
LPLQ LQ —wtP 


Therefore if xpazg is a square, then so is xg. Similarly 


1 | (ee — A)yg — (29 - Aue’ 
(ep — A)(eq— A) tq —@p 


ZR A= 


and the analogous expression with A replaced by —A. The result follows im- 
mediately from these formulas provided that ypyg 4 0 (so that each of rpzg, 
(xp — A)(xq — A), (vp + A)(xQq + A) are non-zero). 

Now assume that yp = 0. Then yg ¥ 0, or else either Q = P so that P+Q = O, 
or Q is another point of order 2, and the hypothesis is not satisfied. We also have 
yr # 0 or else either Q = O or yg = 0. Therefore xz,4R — A,xeg +t A are all 
non-zero. 

Two of the three integers xpxg, (tp — A)(tg — A), (wp + A)(tQq + A) are 
non-zero squares, which implies that two of the three integers xr, tR-— A,tR+A 
are non-zero squares. Moreover their product is the non-zero square ue, and so the 
third of these integers is also a non-zero square. 


Exercise 17.5.1. Suppose that P € Et (Q) and that it corresponds to a Pythagorean triangle 


of area A, with parameter t. Prove that Q := P+ (A,0) € EX (Q) and that it corresponds to the 
t+1 


same Pythagorean triangle of area A, but with parameter T’, where T = 7+. 


556 17. Rational points on elliptic curves 


An appropriate construction for the second step of our plan is obtained from 
the following converse theorem, which allows us to construct a smaller solution from 
a larger: 


Theorem 17.2 (The converse theorem). If Q = (X,Y) € E4(Q) with X, X —A, 
and X + A each non-zero squares, then there exists P € Ei (Q) for which Q = 2P. 


Proof. Write X — A= u?, X = v?, and X + A= w? so that u? + w? = 2v? and 
w? — u? = 2A. We now let 


Substituting we obtain 


a — Ate | (u— w)(v — w)(v =u) 


y? 7 A(2vu —u-—w) 
Now 2(u — u)(v — w) = 2v? — 2u(u t+ w) + 2uw = u? + w? — Qv(u + w) + 2uw = 
(u+w)(u+w — 2v), and so the last line becomes (w? — u?)/2A = 1. Therefore 
Py = (x,y) € E4(Q), and we can verify Q = 2P, using the equations of Proposition 
We select P = +P or +P + (0,0), as one of these is in EX{(Q) by exercise 
and we select the sign “+” so as to get the sign of Y correct. 


A plan for the descent, based on these last two results: 


Let S, be the set of triples t = (p,q,r) for which there exists some P € EX(Q) 
such that holds. Select P, € EX(Q) to be that P € E{(Q) satisfying 
for which m is minimal. Now, given point P € EX{(Q) satisfying (7.5.1) 
for some given T, let Q = P+P,, so that Tg = (1,1, 1) by Proposition17.5.1] Then 
we can determine R € EX(Q) for which Q = 2R, by Theorem [7.2] Therefore 
2R = Q = P+P,, so we expect that mp is smaller than mp. If so, then we 
can prove Mordell’s Theorem by induction on mp: If mr,mp. < mp, then we 
know that R and P, can be expressed in terms of our finite basis, and therefore 
P=2R- P, also can. To make this plan work we need to compare the mr with 
mp. 


Height bounds. Expanding the square in the second formula for xp in the 
proof of Proposition [17.5.1] we have 
mg _ mpnp(mp - A’nb ) + mp_Np (m2, — A?n4,) — 2npnp_lplp. 
2 >) 


ne (mp, nb —mpné, ) 


so that mg divides the numerator of the expression on the right. Now mp > An} 
so that A(nplp)? = (An?,)mp(miZ, — A?n%) < m4, and therefore 


Amg < Amb md. 
The proof of Theorem 17.1] gives that mg = ((m% + A?n*)/g)*, where g := 


(2lpnR, m2, + A?n%), and g < 2(mpR, A)? < 2A? by exercise [I7.2.3(d). We deduce 
that 


(17.5.2) ma = (m2 + A’nh)/g)? > (m>/2A?). 


Pythagorean triangles of a given area 557 


Combining our two estimates, we find that 
(m?,/2A2)? < mg < 4m2, m3, /A, 
and so mr < mp provided mp > 4A?m p,. We have proved the following result: 


Theorem 17.3. For each T € Sa select P, € EX(Q) to be that P € EX(Q) 
satisfying for which mp is minimal. Let Ca = 4A? max;cs, mp,. Let 
Ga := {P € EX(Q): mp < Ca}. Then EX(Q) is generated, additively, by the 
points in Ga. 


Since Gy is finite we deduce that F.4(Q) is finitely generated, moreover that 
E,4(Q) can be written in the form T @ Z” where T is the set of torsion points, 
for some integer r > 0, the rank of E.4(Q), which is the number of independent 
generators of infinite order This is Mordell’s Theorem for congruent number 
curves. 


One can prove Mordell’s Theorem for general elliptic curves by suitably modi- 
fying this proof, as we will discuss in appendix 17A. 


Completely characterizing which integers A are the area of some rational right- 
angled triangle remains an open question of current interest. The beautiful book 
highlights the following surprising result: 


Theorem 17.4 (Tunnell’s Theorem). Assume certain widely believed conjectures 
about elliptic curves[)| Then squarefree integer A is the area of a Pythagorean 
triangle if and only if 


#{(a,b,c)€ Z®: A=a? + 2b? +8c? with c odd} 
= #{(a,b,c)€ Z?: A=a? + 2b? + 8c? with c even}. 


The connection between Pythagorean triangles of area A and the number of 
representations of A by certain quadratic forms is quite mysterious and goes through 
some amazing links between elliptic curves and modular forms. 


Much of the discussion in this chapter is developed from: 


[1] Stephanie Chan, Rational right triangles of a given area, Amer. Math. Monthly 125 (2018), 689— 
703. 


Exercise 17.5.2 (Areas of rational-sided triangles, I). Let T be a triangle with rational side 
lengths. 
(a) Prove that rational-sided isosceles triangles of area 2A are in 1-to-1 correspondence with 
rational-sided right-angled triangles of area A. 
(b) Show that if T has rational area A, then it has rational height, no matter which side is the 
base. 
(c) Draw a perpendicular line from the base of the triangle to the top triangle vertex. This splits 
the triangle into two rational-sided right-angled triangles. Prove that we can parameterize 


4It is worth remarking that there is no simple structure theorem for infinite abelian groups, anal- 
ogous to the result discussed in section [3.16]of appendix 3C for finite abelian groups. For example, the 
group (Q, +) looks very different in that 2Q = Q yet it has infinite rank. 

>This needs to be vague as there is considerable work in defining and explaining the terminology 
in the statements of these conjectures. 


558 17. Rational points on elliptic curves 


a rational scalar multiple of T as follows, where a and 6 are rational numbers > 1: 


b(a? — 1) a(b? — 1) 


(d) Prove that rational number A is the area of a triangle with rational side lengths if and only 
if there exist rational numbers a, b,c for which A = abc?(a + b)(ab— 1). 

(e) Verify that for given A and 6 there is a 1-to-1 correspondence between such triangles and 
rational points on the elliptic curve E44 : y? = x(a + Ab)(x — A/b) (taking x = Aa,y = 
A?/bc) with « > A/b. 

(f) Show that if we are given a point (Aa, A?/bc) € E4,,(Q), then we can determine another 

2 
point (X,Y) € E4.4(Q) with X = ({e thbe)2, 
Exercise 17.5.3 (Areas of rational-sided triangles, Il). Fix a squarefree positive integer A. Let 
t = A?u* for some u € Q and then 


2(t + 1)? A(t + 1)u? 
a= a(t + 1)" and b= BA(E + Lut 
9Au2 (2t — 1)? 
Use these parametrizations to prove that there are infinitely many distinct rational-sided triangles 
of area A. Exhibit several of area 1. 


17.6. Some nice examples 


Diagonal cubic surface. G. H. Hardy, visiting Ramanujan as he lay ill from 
pneumonia in an English hospital, remarked that the number 1729 of the taxicab he 
had ridden from the train station to the hospital was extremely dull. Ramanujan 
contradicted him by noting that it is the smallest number which is the sum of two 
cubes in two different ways: 


13 + 123 = 93 + 10° = 1729. 


(Ramanujan might also have mentioned that it is the third smallest Carmichael 
number!) 


For a given integer m > 1, we are interested in integers a and b for which 
a?+b3 = m. The rational solutions of a3-+b? = m (4 0) are in 1-to-1 correspondence 
with the rational points (x,y) on the elliptic curve 


EY: y? =2° —3(12m)?, 


via the transformation u= a+b, v=a-— band then y = 36mv/u, z= 12m/uf" 
So questions about the representation of integers as the sums of two cubes are also 
question about special points on elliptic curves. However, it seems unlikely that 
this will help much with understanding the integer solutions to a? +b? = m. 


®Littlewood, speaking of Ramanujan’s encyclopaedic knowledge of the properties of many different 
numbers, claimed, “Every positive integer was one of his personal friends.” 
7A transformation that we highlighted in section 


Magic squares and elliptic curves 559 


Exercise 17.6.1. Prove that if a and b are integers for which a® + b® = m, then |al, |b] < 
(4m/3)1/2, 


We have seen that 1729 is the smallest integer that can be represented in 
two ways. Are there integers that can be represented in three ways or four ways 
or...? This is not difficult to answer using the doubling process on the cubic curve 
x? + y? = 1729. One can take a rational point on this curve, transform this to a 
rational point on the elliptic curve E799, double, and then transform back, but it 
is easier to proceed directly. In general we begin with the identity 


t(é — 2)? + (1 —4)(1 +t)? = (1 — 28). 
Given ax? + by® = cz let t = ax3/cz3 and multiply through by (cz?)* to obtain 
(17.6.1) a(x(by? + cz?))? + b(—y(az? + cz?))? = c(z(ax? — by*))%, 


a new rational point on au? + bv? = c. In particular if u? + v? = 1729, then 
A® + B® = 1729 where 


u3 — 3458 u3 +1729 
17.6.2 igs, eee ee 
( ) Ta °° 7729 — 2u3 


So, starting from the solution (12,1), we get further solutions 


20760 —3457 184026330892850640 61717391872243199 
1727’ 1727 /’ \ 15522982448334911 * 15522982448334911 / ’ 


and it is pointless to write down the next solution since each ordinate has seventy 
digits! The main point is that there are infinitely many different solutions, and we 
write them as (p;/q;,7;/q;), j = 1,2,.... From (£76.2) one can deduce that q; 
divides q;+1 for all j > 1 and so we have N solutions to a? + 6? = 1729q3, taking 
a; = pilqn/qi) and b; = r;(qn/qi) fori =1,...,N. 

Scaling up rational points seems like a bit of a cheat, so let’s ask whether there 
exists an integer m that can be written in N ways as the sum of two cubes of 
coprime integers. People have found examples for N = 3 and 4 but not beyond, 
and this remains an open question. 


Magic squares and elliptic curves. We discussed magic squares in appendix 
1A. In Figure 1.2 of section [13] we parametrized magic squares in terms of 5 
variables, 71, 22,473, a, b. The entries of the magic square are integers if our variables 
are all integers. Every entry is the square of an integer if and only if each xj, 7; + 
a,x; + 6 is a square, for 7 = 1,2,3. Therefore, magic squares in which every entry 
is the square of an integer are in 1-to-1 correspondence with 


(Ea,b, Pi, P2, P3) 


where E,,, denotes the elliptic curve Eq: y? = x(a +a)(x +), each P; € E(Z) 
and 2P; = (24, Yi)- 


560 17. Rational points on elliptic curves 


Further reading on the basics of elliptic curves 


[1] Edray H. Goins, The ubiquity of elliptic curves, Notices Amer. Math. Soc 66 (2019), 169-174. 
[2] Fernando Q. Gouvea, A marvelous proof, Amer. Math. Monthly 101 (1994), 203-222. 


{3] Neil Koblitz, Introduction to Elliptic Curves and Modular Forms, Graduate Texts in Mathematics 
97, Springer-Verlag, New York, 1993. 


[4] Barry Mazur, Number theory as gadfly, Amer. Math. Monthly, 98 (1991), 593-610. 
[5] Joseph H. Silverman, Taxicabs and sums of two cubes, Amer. Math. Monthly 100 (1993), 331-340. 


Appendix 17A. General 
Mordell’s Theorem 


We need to be a bit more formal about the possible values of p,q, r in (17.5.1). Let 
H := Q*/(Q*)?, which means that two non-zero rational numbers whose ratio is a 
square are considered to be equal in H. We define a map 


@: E4(Q) > {(a,b,c) € H: abc = 1 in AH} by (2, y) = (a@ — A, x, a+ A). 


if — A,z,2+ A are all non-zero. (So if x = m/n? then c - A= m-— An? in H, 
and this = pq in H in ((7.5.1).) 

Ifa+A = 0, then «— A, are non-zero, and we let ¢(P) = (c— A,x,2(a—A)), 
and we define ¢(P) analogously if z = 0 or x — A= 0. The identities in Proposi- 
tion [7.2.1] and the proof of Proposition [7.5.1] imply that ¢ is a homomorphism. 
The converse theorem (Theorem [I7.2) implies that ker¢é = 2E4(Q). The first 
isomorphism theorem then implies that the image of @ 


b(Ea(Q)) = E4(Q)/2E4(Q). 


We have been working with E{(Q). The condition > A implies that if P € 
EX(Q), each coordinate of $(P) is positive, and so ¢(E{(Q)) is a subgroup of 


$(E.4(Q)), and the quotient group, one. is isomorphic to {(1, 1,1), (—1,—-1, 1}. 
A 


Mordell’s Theorem implies that E(Q)/2E(Q) = T/2T @ (Z/2Z)". We saw 
above how to restrict the elements of ¢(£(Q)) to a finite set where each entry is a 
divisor of 2A. It was Weil who first fully developed the role of the map ¢. In honor 
of their work the group of points E(Q) is known as the Mordell-Weil group. 


17.7. The growth of points 


Another important issue is the growth of the size of the coordinates after suc- 
cessive doubling. We define the height of a rational number a/b with (a,b) = 1 
to be H(a/b) := max{|al,|b|}. We extend this to a point P € E(Q) by letting 


561 


562 Appendix 17A. General Mordell’s Theorem 


H(P) := H(a(P)). If P € EX(Q), then x(P) = m/n? > A and so H(P) = mp. By 
Proposition [[7.2.]] and exercise [[7.2.3a),(c) we have that if Q = 2*P with k > 2, 
then $H(Q)* < H(2Q) < 4H(Q)*. 


Exercise 17.7.1.¢ Let P € EX (Q) and Q = 4P. 
(a) Prove that (471/3H(Q))*” < H(2"Q) < (4!/3H(Q))*" for all r > 1. 
(b) Prove that limp_+o6 H(2* p)1/4* exists, which we denote by H(P), the Néron-Tate height. 
(c) Prove that H(2P) = H(P)4. 
(d) Prove that 4~!/3H(Q) < H(Q) < 4!/3H(Q). 


One can similarly define the Néron-Tate height for points on an arbitrary elliptic 
curve. However it is much more challenging to obtain suitable upper and lower 
bounds on H(2P) in terms of H(P), for the general elliptic curve. 


Four squares in an arithmetic progression. If a — d, a, a+d, and a+ 2d are 
all squares, say, u2,, u2, ut, u3, then (—2d/a — 1, 2u_juouiuz/a”) is a point on 
the elliptic curve 

E: y? = (@—1)a(24+3). 


In this case, Image(¢) is a subgroup of ((—1,—1, 1), (2,1, 2), (1,3,3)), which 
is itself a subgroup of {(a,b,c) € H : abc = 1 in H}. We have the elements 
@((—1,2)) = (—2,-1,2) and $((—3,0)) = (—1,—-3,3) of Image(¢), and we now 
prove that (—1,—1,1) is not in Image(¢). If it were, there would exist integers 
M,N, U,V, w with (m,n) = 1 for which m—n? = —u?, m = —v?, and m+3n? = w?. 
But then —v?+3n? = w?, and so —v? = w? (mod 3), but as (+) = —1 this implies 
that 3 divides v and w, and so n. Therefore 3 divides (m,n) = 1, a contradiction. 

We deduce that Image(¢) © (Z/2Z)? and so E(Q) is all torsion, since we have 


four rational points of order dividing 2. One can show that 
E(Q) = {O, (1, 0), (0, 0), (-3, 0), (3, +6), (-1, +2)} > Z/2Z icp) Z/4Z. 


Translating this back to the original question yields: 


Theorem 17.5 (Fermat’s Theorem). There are no four squares in an arithmetic 
progressiona—d, a, a+d,a+2d. 


Exercise 17.7.2. Use Szemerédi’s Theorem (Theorem[15.6]from section[I15.6) and Fermat’s The- 
orem to deduce the following: For any 6 > 0 there exists a constant Ms such that if N > M5, 
then any arithmetic progression of length N contains < dN squares. (It is conjectured that the 
N-term arithmetic progression with the most squares is 1,1 + 24,1 + 24-2,...,1+24(N —1), 
which contains about ,\/8N/3 squares; the best bound proved to date is at most a little more than 
N®/® squares.) 


Exercise 17.7.3 (Another proof that there are infinitely many primes). Suppose not and that 
P1;--+;Pk is the complete set of primes. We will color the positive integers as follows: By the 
Fundamental Theorem of Arithmetic one can write every positive integer n in the form oe a pk 
where the e; are integers > 0. We will color n as c(n) = Di pe, where rj; is the least non- 
negative residue of e; (mod 2) for 7 =1,...,k. 

(a) Establish that c(n) provides a coloring of the positive integers with 2* colors. 

(b) Use van de Waerden’s Theorem (Theorem [15.5] from section [I5.6) to establish that there is 
a four-term arithmetic progression of integers A, A+ D,A+2D,A+ 3D for which c(A) = 
c(A+ D) =c(A+2D) =c(A+3D). 

(c) Let a= A/c(A) and d= D/c(A). Prove that each of a,a+d,a+ 2d,a + 3d is a square. 

(d) Establish a contradiction using Fermat’s Theorem. 


Appendix 17B. Pythagorean 
triangles of area 6 


We begin with the (3, 4, 5)-triangle, which yields an infinite sequence of Pythagorean 
triangles of area 6 by Theorem [17.1] Are there any other Pythagorean triangles 
of area 6? To answer this question we need to determine the Mordell-Weil group 
Eg (Q). By considering the divisors of 12, we know that ¢(E¢(Q)) is a subgroup of 
the group I generated (multiplicatively) by (2, 2,1), (2, 1, 2), (3, 3,1), (3, 1,3), which 
is itself a subgroup of {(a,b,c) € H : abc = 1 in FH}. 

We already know several points in E¢(Q), and therefore several elements 
of ¢(E¢(Q)): We always have ¢(O) = (1,1,1). The torsion point (6,0) yields 
$((6,0)) = (2,6,3). The (3,4,5)-triangle yields P = (12,36) € Eg (Q) and ¢(P) = 
(6, 3,2), and so ¢(P + (6,0)) = d(P)¢((6,0)) = (3, 2,6). We claim that 


(Eg (Q)) = {(1, 1, 1), (2,6, 3), (6,8, 2), (3, 2, 6)}. 


To prove this we need to show that the remaining elements of T © (Z/2Z)* do 
not belong to Image(@). Since these are both groups, we need only show that two 
independent generators of I’/((2, 6,3), (6,3,2)) do not belong to Image(¢): 

If (2,2,1) € Image(¢), then there exist integers m,n,u,v,w with (m,n) = 1 
for which 


m—6n? =2u?, m=2v7, m+6n?=w". 


This leads to 2v? + 6n? = w? and so w? = 2v? (mod 3). However (3) = —1 and so 


3 divides v and w, and hence n, implying that 3 divides (m,n) = 1, a contradiction. 
If (2,1,2) € Image(¢), then there exist integers m,n,u,v,w with (m,n) = 1 
for which 


m—6n? =2u?, m=v*, m+6n? = 2w’. 


This leads to v? + 6n? = 2w? and so v? = 2w? (mod 3). However (3) = —1 and so 


3 divides v and w, and hence n, implying that 3 divides (m,n) = 1, a contradiction. 


563 


564 Appendix 17B. Pythagorean triangles of area 6 


This establishes the claim, and so since ¢(E¢(Q)) includes the image of the 
torsion point (6,0) we deduce that Eg¢(Q), and so E¢(Q), has rank 1, the infinite 
part generated by P. Finally E¢(Q) & (Z/2Z)? @Z. 


Therefore all of the Pythagorean triangles of area 6 are generated by nP = 
(fn, Yn), n> 1, taking t = z,/6 and g = 36/|yn|. 


Integer points 


In appendix 6C we need all of the (x,y) € E¢(Z) with x divisible by 6. The values 
t = 2, 3, and 49 given there correspond to the points (12,+36), (18,+72), and 
(294, +5040) € E6(Z). These are the only such integer points on E¢(Z), but this is 
difficult to prove. Siegel’s Theorem tells us that E'(Z) is always finite but finding all 
of its elements can be quite a challenge. Bennett determines E.4(Z) whenever 
A=p or 2p for some odd prime p: 


There are several “families” of integral points that only occur in very special 
circumstances. For example, if p can be written as the sum of two fourth powers, 
say, p = r++ s*, then (—(2rs)?, +4rs(r* — s4)) € Exp(Z). Or if p? = 2m? — 1 for 
some integer m, then (m?,+(m% —m)) € E,(Z). None of these types of special 
circumstances occur when p = +3 (mod 8). In that case, if (x,y) € E.4(Z), then 


either y = 0 or we have one of the points 
(—3, +9), (—2, +8), (6 - 2, +67), (6- 3, +67 - 2), (6 - 49, +6? - 140) € E6(Z), 
(—4, +6), (5-9, +57 - 12) € E5(Z), (22 - 99, +22? - 210) € Ey0(Z), 
or (29 - 9801, +29? - 180180) € Eo9(Z). 
We observe that A divides x for most of these points (x,y) € E.4(Z). For such 
points we can provide a good bound on 2, assuming the abc-Roth conjecture from 


section 


Theorem 17.6. Assume the abc-Roth conjecture. Fix 6 > 0. There exists a 
constant cs > 0 such that if (a,y) € E,(Z) with x divisible by squarefree A, then 
|x| < cs A?+?, 


Proof. Select € such that (1—2e)(1+6) = 1. Write x = AX so that y? = 2?9—A?z = 
A3(X? — X). Therefore A? divides y as A is squarefree, and so X? — X = AY? 
where y = A?Y. We now apply the abc-Roth conjecture with F(u, v) = uv(u?—v?), 


so that F(—1, X) = AY?, which yields 


reelXP*< [I ps Alv| = (4: AY)? < (Ax) M2, 
p|AY 


We deduce that |X| < csA!t® where cs = eae and the result follows. 


There are many techniques to limit integer points in: 


[1] Michael Bennett, Integral points on congruent number curves, Int. J. Number theory 9 (2013), 
1619-1640. 


Appendix 17C. 2-parts of 
abelian groups 


17.8. 2-parts of abelian, arithmetic groups 


Let G be a finite abelian group, written additively, so that, by the Fundamental 

Theorem of Abelian Groups (as discussed in section 3.16] of appendix 3C) 
G2YHOZ/2"Z8---@Z/2"Z 

where # is an abelian group of odd order h, say, for some integers e1, €2,...,e¢ > 1. 

If g = (a,b1,...,b¢) € G with 2g = 0, then 2a = 0 in H, so that a= "4 -2a=0 

in H; and each 2b; = 0 (mod 2°), so that b; =0 or 2%~! (mod 2°). Therefore 

G/2G & (Z/2Z)*. 
We call ¢ the 2-rank of G. 
More generally if G is a finitely generated abelian group, then we can write 

G=T® Bigetai Oe ST ez’, 

where T contains the elements of G of finite order, and g1,...,g, are linearly 

independent elements of G of infinite order. Now T must be a finite abelian group 


and therefore has the structure above. We call r the rank of the infinite part of G. 
The rank of G is r plus the rank of T. Also 


G/2G = T/2T 6 (Z/2Z)", 


a finite group, so that the 2-rank of G equals r + @, where @ is the 2-rank of T (and, 
as above, T/2T@(Z/2Z)*). In the theory of elliptic curves we wish to determine the 
rank of the infinite part of the Mordell-Weil group E(Q), which equals the 2-rank 
of E(Q), minus 0, 1, or 2, depending on whether there are 0, 1, or 3 elements of 
order 2 in E(Q). 


565 


Appendix 17D. Waring’s 
problem 


17.9. Waring’s problem 


In Theorem [12.6] we proved that every positive integer is the sum of four squares. 


How about sums of cubes? If —2 < a < 3 and integer n = a = a? (mod 6), 
then 


n= (a+ 1)? + (a — 1)? + (—a)? + (—2)? + a? = 62 +03, 


a sum of five cubes. We can ask about sums of non-negative cubes instead. More 
generally, for each integer k > 2, Hilbert showed in 1909 that there exists an integer 
g(k) such that every integer is the sum of g(k) kth powers of non-negative integers, 
resolving a 1770 problem of Waring. We have seen that g(2) = 4, and it is known 
that g(3) = 9, g(4) = 19, 9(5) = 37,9(6) = 73,.... Actually g(k) grows fast, 
because one can need a large number of kth powers to represent small integers: 


Exercise 17.9.1. Let s(n) be the smallest number of positive integers a1,...,a@s for which 
n=ak +---+ak. 
(a) Prove that s,(2* —1) =2* —1. 
(b) Prove that ifn = 2*m — 1 where m = [(3/2)*], then s,(n) = 2" 4+ [(3/2)*] — 2. 
Euler’s son conjectured that g(k) = 2” + [(3/2)*] — 2. 
(c)* Prove that if 2*{(3/2)*} + [(3/2)*] < 2", then Euler Jr’s conjecture is true. 
This inequality follows if {(3/2)*} < 1—(3/4)* which is probably true for all integers k > 2 
and is known to hold for 2 < k < 108. 


Let G(k) be the smallest integer such that every sufficiently large integer is the 
sum of G(k) kth powers of non-negative integers. If it is true that s;,(n) is only 
large for small k, then we might expect G(k) to be significantly smaller than g(k). 
By Theorem [12.5] we know that s2(7- 4’) > 4 for all j > 0, and so G(2) = 4 = g(2), 
no improvement. But for & = 3 we have 4 < G(3) < 7 smaller than g(3) = 9. In 
general G(k) < klogk + kloglogk + Ck for some constant C, much smaller than 
g(k), which is > 2°. 


566 


17.9. Waring’s problem 567 


Stop the press. As we finish proofreading this book, in September 2019, there 
have been extraordinary developments on the question of which integers can be 
represented as the sum of three cubes. Since every cube is = —1, 0, or 1 (mod 9), a 
sum of three cubes cannot be = 4 or 5 (mod 9) (see exercise[6.5.10). Heath-Brown 
has conjectured that every integer # 4 or 5 (mod 9) can be written in infinitely 
many different ways as the sum of three cubes of integers (positive or negative). 
Until this month we had only known the very small solutions 1° + 1° + 13 = 
43 + 43 — 5% for 3, and feared there might be no more, but a widely distributed 
computation (on the “Charity Engine”) found a new solution where the integers 
being cubed have up to twenty digits each. Moreover the first solutions were found 
for the only remaining cases up to < 100 without a known solution, namely 


33 = 8866128975287528° — 8778405442862239° — 2736111468807040* 
and 42 = 80435758145817515? + 12602123297335631? — 80538738812075974°. 
This lends evidence to Heath-Brown’s conjecture. 
We finish this section by giving Liouville’s proof that g(4) < 53: 


Theorem 17.7 (Liouville). Every positive integer can be written as the sum of 53 
fourth powers. 


Proof. Lagrange’s Theorem (Theorem[12.6) states that any positive integer m can 
be written as the sum of four squares, say, m = n? + n3 + n2 + n3. Therefore 


6m? = B(nitngtngtng)y? = YS) (n+n;)*+ (ui —2,;)%, 
1<i<j<4 

the sum of 12 fourth powers. Now any positive integer q can be written as m? + 

m3 +m + mi, and so 6q can be written as the sum of 4 x 12 = 48 fourth powers. 

Finally any positive integer n can be written as 6qg+r for some r, 0 < r < 5, where 

6q can be written as the sum of 48 fourth powers, and r as the sum of at most 5 

fourth powers, and the result follows. 


Exercise 17.9.2. By proving that each of 0, 1, 2, 81, 16, and 17 can be written as the sum of at 
most 2 fourth powers, deduce that g(4) < 50. 


Further reading on Waring’s problem 


[1] H. M. Davenport, Analytic methods for Diophantine equations and Diophantine inequalities, Cam- 
bridge University Press, 2010. 


[2] W. J. Ellison, Waring’s problem, Amer. Math. Monthly 78 (1971), 10-36. 


[3] R. C. Vaughan, The Hardy-Littlewood method, Cambridge Tracts in Mathematics, 125 (1997), Cam- 
bridge University Press. 


Hints for exercises 


EXERCISES IN CHAPTER 0 
Exercise[0.1.1[b). The key observation is that if a = itive or lee/E then a? = a+1 and 
so, multiplying through by a”~?, we have a” = a"~!+a”~? for all n > 2. 
Exercise[0.1.3(b). Multiplying through by ¢ we have ¢"*! = F,¢? + Fn_-1@. Now use (a). 
Exercise [(Q.1.5{b). Determine a and 6 in terms of a and then c and d in terms of a, 20, 
and 71. 
Exercise [0.2.1(a). Note that N? + (2N +1) =(N +1)’. 
Exercise [0.3.1] In both die use induction on n. 
Exercise [0.4.2] Use (01-1) to establish that |F, — ¢"/V5| < 4 for all n > 0. 


ee If the first character in a string in Ap is a . what must the subsequent 
string look like? What if the string begins with a 1? 
Exercise [0.4.8] Use Gauss’s trick to show that Vacn<p = ‘ee CT ie (eS uz dee 
a product of two integers of opposite parity, both > 1. Show that if N is not a pose of 
2 (so that it has an odd divisor m > 1), then it is a product of two integers of opposite 
parity, both > 1. Determine a and b in terms of N and m. 
Exercise a). Verify this for k = 1 and 2, and then for larger k by induction. 

(b) aes k and m as functions of n. 


Exercise[U-£16] By (LI), V5Fn = $” — 6", and so (V5Fn)* = Yoh (5)(—1)7 7? where 


0ji:= Bohs, Let «®t! — ae qx = Taal —p,;). Therefore 


ci( (V5Fnti)® = > (Jy oy Cp = = 3 (‘) (-1)' oF . p37? = (VSPA) 
i=0 


j=0 j=0 


The result follows after dividing through by (/5)*. 


Exercise [0.6.1(a). Prove this for k = 0, and then by induction on k, using differential 
calculus. 


Exercise|0.18.3] Substitute the value of y given by the line, into the equation of the circle. 
Exercise |Q.18.4| Subtract the equations for the two circles, and use exercise [0.18.3 


569 


570 Hints for exercises 


EXERCISES IN CHAPTER 1 
Exercise [1.1.1{a). Write a = db for some integer d. Show that if d # 0, then |d| > 1. 


(b) Prove that if wu and v are integers for which uv = 1, then either u = v = 1 or 
u=v=-—l. (c) Write b = ma and c = na and show that bx + cy = maz + nay is divisible 
by a. 


Exercise Use Lemma and induction on a for fixed b. 

Exercise [1.2.]{a). By exercise [LT.1{c) we know that d divides au + bu for any integers u 
and v. Now use Theorem (d) First note that a divides 6 if and only if —a divides b. 
If |a| = gcd(a, b), then |a| divides both a and 8, and so a divides b. On the other hand if 
a divides b, then |a| < gcd(a, b) < |a| by (c). 

Exercise b). Let g = gced(a, b) and write a = gA,b = gB for some integers A and B. 
What is the value of Au + Bu? Now apply (a). 

Exercise [1.2.5{a). Use Theorem 

Exercise Use Lemma(I.4.1 

Exercise[L.7.5{e). Write r = m-+6 where 0 < 6 < 1, so that [r] =m anda—r =a—m—6 
so that [a — r] =”. 

Exercise [1.7.10} Given any solution, determine u using Lemma|l.1.1 

Exercise [1.7.11] One might apply Corollary 

Exercise [1.7.14{d). Use exercise [1.7.10 

Exercise [7.22] For each given m > 1, prove that am|%m, for all r > 1, by induction on 
r, using exercise [0.4.10(a) with k = rm. 

Exercise [1.7.23(a). Prove that gcd(an,b) = gcd(arn_1,6) for all n > 2, and then use 


induction on n > 1, together with Corollary [2.2] (b) Prove that: gcd(tn,In—-1) = 
gcd(ban—2,tn—1) for all n > 2, and then use induction on n > 1, together with Corollary 
1.2.2] (c) Use exercise[0.4.10(a) with k = n—m and then (b). (d) Follow the steps of the 


Euclidean algorithm using (c). 

Exercise Use the matrix transformation for (uj,uj41) 4 (uj41, Uj+2)- 

Exercise [1.14.1{c). If n is odd, take a = b = c= 1,d = —1. Show that if n is even, then 
a, b,c,d are odd so that ad — bc is even. 


Exercise [1.17.1] Divide the representation of 2/n above by an appropriate power of 2. Be 
careful when 6 is a power of 2. 


EXERCISES IN CHAPTER 2 


Exercise b). Write the integers in the congruence class a (mod d) as a+ nd as n 
varies over the integers, and partition the integers n into the congruences classes mod k. 


Exercise Write the congruence in terms of integers and then use exercise [L.1.i{c). 
Exercise Write the congruence in terms of integers and then use exercise [1.1.ife). 
Exercise c). Factor 1001. 


Exercise [2.5.4{a). Split the integers into k blocks of m consecutive integers, and use the 
main idea from the first proof of Theorem[2.1] (b) Write N = km-+r with 0 <r<m-1. 
Use (a) to get k such integers in the first km consecutive integers, and at most one in the 
remaining r. Compare k or k + 1 to the result required. 


Exercise b). Use the results for m = 4 from (a). (d) Use the same idea as in (c). 
(e) Study squares mod 8. 


Exercise 2.5.9{b). Use that FC) = nee 


a1 J 


Hints for exercises 571 


Exercise [2.5.10{a). Treat the cases a > b and a < b separately. (b) Treat the cases c > d 
and c < d separately. 


Exercise [2.5.13] Proceed by induction on k > 1. 
Exercise [2.5.15{b). Use induction. 


Exercise [2.5.16{a). Try a proof by contradiction. Start by assuming that the kth pigeon- 
hole contains a, letters for each k, and determine a bound on the total number of letters 
if each az, <1. (b) Use the pigeonhole principle. (c) Use induction. 


Exercise [2.5.17{a). Use the pigeonhole principle on pairs (x, (mod d),2,41 (mod d)). 
(d) Use exercise [1.7.24 


Exercise [2.10.2| Use induction. 


EXERCISES IN CHAPTER 3 


Exercise [3.0.1] The only divisors of p are 1 and p. Therefore gcd(p,a) = 1 or p, and so 
gcd(p,a) = p if and only if p divides a. This implies that gcd(p,a) = 1 if and only if p 
does not divide a. 
Exercise Use induction and the fact that every integer > 1 has a prime divisor, 
as proved in the “prerequisites” section. (The proof will appear as part of the proof of 
Theorem [3.2] ) 
Exercise a). Apply Theorem 3.1] with a = a1 ---a,—1 and b = ax, and if p divides a, 
then proceed by induction. 

(b) p divides some q; by (a), and as q; only has divisors 1 and q;, and as p > 1, we 
deduce that p = q;. 
Exercise b). Write n = 2*m with m odd. Then n has an odd prime factor if and 
only ifm > 1. Therefore if n has no odd prime factor, then n = 2°. 
Exercise We have [a,b] = ab by Corollary The result follows from Lemma 
14.1 
Exercise [3.3.1] Look at this first in the case that m and n are both powers of p, say, 
m = p* and n = p’. If d divides m and n, then d = p°, say, with c < a and c < b. 
The maximum c that satisfies both of these inequalities is min{a, b}. Similarly if m and 
n divide L = p®, then a < e and b < e and so the minimum e that satisfies both of these 
inequalities is max{a,b}. Now use this idea when m and n are arbitrary integers. 


Exercise Use exercise d). 
Exercise [3.3.7(c). Use exercise [3.3.3{c). 
Exercise [3.5.1{a). Show that the aj + b are distinct mod m. 


Exercise [3.5.2] Prove that the rj; (mod m) are all reduced residues, and then that they 
are distinct. 

Exercise If ar =c (mod 0), then b divides ar — c. Therefore gcd(a, b) divides ar —c 
and so c. In the other direction, we write g =gcd(a, b) and so a = gA,b = gB,c = gC, 
and we are looking for solutions to Ar = C (mod B). Then use exercise b). 
Exercise Use the second proof of Corollary 


Exercise If am + bn = c, then am + bn = c (mod b) (or indeed mod any integer 
r > 1). On the other hand if au+ bv = c (mod b) and m is any integer = u (mod 6), then 
am = au + bv =c (mod b) and so there exists an integer n for which am + bn = c. 


Exercise [3.7.2(a) We proceed by induction on the number of moduli using exercise B.2.1 


(b) Replace m in (a) by m—n. 


572 Hints for exercises 


Exercise[3.7.8{a). Work with the prime power divisors of m and use the Chinese Remainder 
Theorem. 


Exercise Calculate the product mod p°, for every prime power p*||m. 
Exercise Use exercise [1.7.20[a). 
Exercise [3.9.3{a). If 2k +1 = n/m, take u=a™ and v = 8™ in 


y2kth 4 y2ktl 


= (—uv)" + SO (-ue)?7 (u? +0), 


g=1 


Ur U 


so that yn/ym is a linear polynomial in the y2;m with coefficients that are + powers of b. 
Exercise B.9.6] Use exerciseB.3.7[c), and factor gA® — gB?. 
Exercise a). Write ue as a polynomial in y and z. 


Exercise B.9.10(a). 2+ V3 is a root of «* — 10x? +1. Use Theorem B.4] 

(b) Yat Vb is a root of #* — 2(a + b)x® + (a — b)?. Therefore the rational root 
m = /a+ Vb must be an integer, and then m divides a — b. Writing a = b-+ mk we have 
k = J/a— Vb s0 that b = (“5*)? and a = (4*)?. 
Exercise B.9.11(b). Prove that (Vd +m)(Vd— m) is an integer 
Exercise [3.9.15{b). Use Corollary 
Exercise[3.9.17[b). Write m = gM and n = gN where g = gcd(m,n) so that (M,N) = 1, 
and then use exercise[3.7.7] (or exercise [3.9.16{b), for a less complete solution). 
Exercise |3.10.2| Write the trinomial coefficient as the product of binomial coefficients. 


Exercise|3.11.1} Prove this by induction on n > 1, using the observation in the paragraph 
immediately above. 


Exercise [3.15.2] Let G = Z/4Z and H be the subgroup of order two. Determine the 
maximal order of an element of H @ G/H, as well as of G. 

Exercise [3.19.1] Use exercise |0.14.6 

Exercise [3.19.2] R is a Euclidean domain if there exists w: R — Z>o such that for any 
a,b € R with b # 0, there exists gq € R such that if r = a — qb, then w(r) < w(b). Given 
any ideal I of R, let be I, b £0, with w(b) minimal. For any a € I let r=a—qbe€ Iso 
that w(r) < w(b) which contradicts the minimality of w(d). 

Exercise |[3.22.1}| Use Proposition |2.10.1]and adapt the proof of Euclid’s Lemma. 


Exercise B24 1c). f(x) = 3% +1 has the rational root —} yet f(n) = 1 (mod 3), for all 
integers n. 


EXXERCISES IN CHAPTER 4 


Exercise One can proceed by induction on the number of distinct prime factors of 
n, using the definition of multiplicative. 
Exercise Pair m with n — m, and then m with n/m. 
Exercise If the prime factors of n are p1 < po <--- < pr, then pj; > k+ J and so 
$@) ype top eo kd 

mn — Llj=1 “p,; j=l k+p  ~ 2h — 2° 
Exercise Let £ = (d,a) so that £|a and therefore d/é|(a/£)b with (d/£,a/) = 1 and 
therefore m = d/é\b. 
Exercise [4.2.3{b). What is the power of 2 in a(n)? 
Exercise Give a general lower bound on o(n). 


Exercise [4.2.5{a). If p°||n, then 1+ = < a(p*)/p® < 14 


i 
p' p 


Hints for exercises 573 


(b) If n is a perfect number, then o(n)/n = 2, and if it is odd with < 2 prime factors, 


then T]oin poi < 3.3 which is < 2, contradicting (a). 


Exercise [4.3.7(a). Use exercise [3.9.15/a). 
Exercise [4.3.1]{a). Prove this when a and b are both powers of a fixed prime and then 
use multiplicativity. 
Exercise [4.3.12] In both parts write, for each d\n, the integers m = an/d with (a, d) = 1. 
Use exercise |4.1.3 
Exercise [4.3.13/a). You could use the second part of exercise [4.1.3] 
Exercise [4.3.15{b). Use multiplicativity. (e) Use exercise [4.2.5 
Exercise [4.5.1[a). Use the binomial theorem. (b) Let m =T[],,,, p and « = —1 in (a). 
Exercise Expand the right-hand side. 
Exercise Let r = (a,m) and then s = a/r and t = m/r which therefore must be 
coprime. Now a = rs divides mn = rtn, so that s divides tn and therefore s divides n as 
(s,t) =1. Let u = n/s and we finally deduce b = mn/a = tu. 
Exercise [4.8.2] Use the expansion ¢(n) = ae u(n/d)d from the proof of Theorem [4.J]in 
section 4.4, and a similar expression for o. 
Exercise b). How large is the set {n >1: |f(n)| > n7}? 
Exercise .IL.1] For n = 1,2,... determine the coefficient of t” in a(t)b(t) = 1 asa 
polynomial in the a; and b;, and then find where a given bm first appears. 
Exercise[4.14.1{c). Use that {t} = t— [t], and cut the integral up into intervals [n,n + 1), 
taking the integral up to integer N and then letting N — oo. 
Exercise [4.15.1{a). Write t = n+ u, subtract logn, and then integrate the first few terms 
of the resulting Taylor series. 
Exercise [4.16.1{b). Use the Fundamental Theorem of Algebra. 
Exercise |4.16.4| Use the Mobius inversion formula from section 

EXERCISES IN CHAPTER 5 
Exercise Show that if 22"" <a< aa then there are > n primes up to x. Then 
give a lower bound for n as a function of x. 
Exercise Show that if every prime factor of n is = 0 or 1 (mod 3), then n= 0 or 1 
(mod 3). 
Exercise |5.3.4| Consider splitting arithmetic progressions mod 3 into several arithmetic 
progressions mod 6. 
Exercise [5.3.5] One might use exercise [3.1.4{b) in this proof. 


Exercise[5.4.1(b). We wish to show that m(x+ex) > n(x). By (6.4.2) (and footnote 14) we 


know that for any fixed 6 > 0 we have (1— 6) 25 < a(x) < (1+) if x is sufficiently 


large. The result will then follow if the middle inequality holds in 


log x 


x a+ €x 
14 6 
ee ee) log x - Meee + €x) Sar) 
Now male < 1+ jogg as log(1 + €) < ¢, and so the middle inequality follows if 
1+ poy < (1—-4)(1+6€)/(1 +5). Selecting, say, 6 = €/3 this holds if « is sufficiently large. 


Exercise |5.8.11} Use |’Hopital’s rule. 


Exercise [5.8.12] First prove that (Lie) wiz) | wep > las @— oo. 


Exercise [5.8.14{a). Use Corollary 2.3.1] 


574 Hints for exercises 


Exercise [5.9.1] Either use Kummer’s Theorem (Theorem [3.7) or consider directly how 
often p divides the numerator and denominator of Ca 

Exercise Use induction to show that, for each n > 6, every integer in [7,2N + 6] is 
the sum of distinct primes in {2,3,...,2N}, by induction on N > 1. 

Exercise Let p be a prime in [2n, 4n]. Now construct all the pairs you can that sum 
to p. Proceed. 

Exercise [5.10.1] Maximize the log of the ratio using calculus. 

Exercise |[5.10.2} Use Proposition [5.10.1 

Exercise[5.10.3{a). If r < s/2, then by Bertrand’s postulate there is a prime p € (s/2,s] C 
(r, s]. Otherwise k = s—r <r. In either case, by Bertrand’s postulate or the Sylvester- 
Schur Theorem, one term has a prime factor p > k, and so this is the only term that can 
be divisible by p. 

Exercise [5.11.8[b). Use the Fundamental Theorem of Algebra mod p (see Lagrange’s 
Theorem, Proposition [7.4.1). 

Exercise [5.11.9{a). Can be proved by induction on k. For k = 0 this is trivial. For larger 
k, let T C {1,2,...,m—1} and we pair together the terms for S = T and S =TU{m} 
in our sum. The sum therefore becomes 


~S Cy (on -+20+ 5221) - (20+5321) 


TC{I,2,...,m—1} g€T jeT 


i=0 TC{1,2,...,m—1} jeT 
and the result follows by induction, asm—1>k—12>i. 
(b) Let 2 = logn and if n has prime factors pi,...,pm, then let x; = —logp, for 
each j > 1. 
(c) We get klai...a, in (a) and so (—1)*k! I],jnlogp in (b). We prove this by 
induction using the proof in (a), since in the induction step only the 1 = k — 1 term 
remains, which is the result from the previous step multiplied by kx. 


Exercise[5.16.1] If Re(s) > 1, then the Euler product for ¢(s) is absolutely convergent (as 
we proved) and so ¢(s) = 0 if and only if some 1— p~* = 0. 


Exercise [5.28.3] We will show how to find all the possible orbits when the period has 
length 1: 


(a) Suppose we have the orbit 0 > b > a> a-—--- with 0,a,b distinct integers. By 
Corollary [2.3.1] we have that b = b — 0 divides f(b) — f(0) = a-—b and so b divides 
a. Moreover, a = a — 0 divides f(a) — f(0) = a — b and so a divides b. Therefore 
|b] = |a], and so b = —a as a and 6 are distinct. To find the polynomials f(.) for 
which f(0) = —a, f(a) = f(—a) = a we extrapolate. The second two conditions give 
that f(x) = a+ (# —a)(a + a)h(x) for some h(x) € Z[x]. Substituting in « = 0 
gives —a = f(0) = a—a’h(0), so that a? divides 2a; that is, a = —2, —1, 1, or 2 (as 
a #0). 


(b) Can an orbit have the shape 0 > u > v > w > w-—---- with 0,u,v,w distinct 
integers? Let g(x) = f(a + u) — u so that this orbit becomes —u > 0 — b> a> 
a—--- whereb=v—uanda=w-—u. Then b = —a = —2, —1, 1, or 2 by (a). 
By Corollary [2.3.1] we have u = 0 — (—u) divides b — 0 = —a and so u = +1 and 
a=+2. Next b+u=u-—a divides a— 0 =a and so a = 2u, which is impossible or 


else a — (—u) = 3u divides a — 0 = 2u. 


Hints for exercises 575 


Exercise [5.28.4[c). If xo is periodic, then an+42 = @n for all n > 0 as the period length is 
either one or two. Moreover, if xo is strictly preperiodic, then 0 is strictly preperiodic for 
the map « > f(x +20) — Xo, with orbit 0 > b1 > bo > --- where b, = @n — Xo. In all 
four cases of our classification b2 = ba, and so x2 = “a. 


EXERCISES IN CHAPTER 6 


Exercise Study where lines of rational slope, going through the point (2,1), hit the 
curve again. 


Exercise Write down an equation that identifies when three given squares are in 
arithmetic progression. 


Exercise [6.3.1/a). By (6.1.1) the area is g?rs(r? — s”) where r > s > 1 and (r,s) = 1. If 
this is a square, then each of r, s, and r? — s? must be squares; call them 2”, y’, and 2”, 
respectively, so that «* — y* = z?, which contradicts Theorem 


(c) Consider a right-angled triangle with sides x”, 2y”, z. 


Exercise[6.5.3] Here b is the hypotenuse, and c is the area. Further hint: We need b? — 4c 

and b? + 4c to be integer squares, say, u2 and v’, so that 4c = b? — u? = v? — b?. Therefore 

2b” = u* + v*, so u,v have the same parity and therefore ("2")? + (“5%)? = b?. This is 
al utu vr-Uu v2 b2+b2 u2 


our Pythagorean triangle, which has area 5 - “3°. *5* = 3 =¢c. 


Exercise [6.5.6] Let a = p/q with (p,q) = 1 so that a = (aw + b)/a = (ap + bq)/p. Now 
(p,q) = 1 so comparing denominators we must have gq = 1, and p divides ap + bq, so that 
p divides bq, and therefore b. 

Exercise By the perimeter of such a triangle has length 2grs + g(r? — s*) + 
g(r? + s*) = 2gr(r +s) where r > s > 0. Therefore n has divisors r and r + s, where 
r<r+s< 2r. On the other hand if n has divisors di, dz for which di < dz < 2di, then 
we may assume they are coprime, by dividing through by any common factor. Therefore 
didz divides n and so we can let r = di, s = dz — di, and g = n/dido. 

Exercise[6.5.9] Prove that ifn > 13, then (n+1)?+128 < 2n?. Then proceed by induction 
on n for m € [n? + 129, 2n”). 


Exercise [6.5.10} What values can cubes take mod 9? 
Exercise By simple geometry things must look like the following diagram. 


A B 


Figure. A circle inscribed inside a right-angled triangle. 


576 Hints for exercises 


If A = (0,0), then BC is the line by + cx = be. The point P = (ar + cr, br + ar) lies on 
this line, and so be = r(b(b+a)+c(a+c)) = r(b? +ab+c? +ac) = ar(a+b+c). Therefore 
the radius of the circle is 

be be(b + c — a) _ beb+ce-—a) _ b+c-a 


ar = = 


a+b+c (b+ct+a)(b+c—a) b?+c?—a?+2bc 2 


Exercise|6.10.1| First prove et Fi)(n+j) A (n+1)(n+J) for k > I, J,i,7 > 1 unless 
{I,J} = {i, ae Suppose that (n+i)(n+j) = (n+I)(n+J) so that (¢+j—I-—J)n = IJ—-1j. 
Iff=J=korl=J=1, ies evidently i = 7 = 1 = J. Therefore k(k —1) > IJ > 2, 
and similarly ij, so that IJ — ij = 0 (mod n) and n > k? —k > |IJ — ij]. Therefore 
IJ =ij and sol +J=i+9, which implies that {I,J} = {i,7}. Now if aray = aja; with 
{I, J} 4 {i,j}, we may suppose that (n+ I)(n+ J) > (n+i)(n+ 9). Therefore 


(n+ (n+ J) — (n+ )(n-+ §) = a0, ((mrmy)t — (mims)*) 
> asa; ((mim; + 1)° — (mim;)*) > Larmi~*ajmé-" 
> &((n + i)(n + 9) */* > 3n4/? > 38n(k — 1) 
asn>k’—k > (k—1)%. Therefore 
3n(k—1) < (n+1)(n+J)—(n+i)(n+j) = 24+ J-i-j)n+ (IJ —-ij) < (k-1)Qn+k+1), 


which is a contradiction. 


EXERCISES IN CHAPTER 7 
Exercise b). Use the technique in the proof of Lemma|[7.1.1 
Exercise [72.2] Let k := ordm(a) and A = {1,a,a’,...,a"~' (mod m)}. Show that if b 
and b’ are any two reduced residues mod m, then either bA and b’A are disjoint or are 
equal. Therefore the sets of the form bA, where b is a reduced residue mod m, which are 
each of size k, partition the ¢(m) reduced residues mod m. This implies that k divides 
$(m) as desired. 


Seed Let k := ord,(2). We have 2? = 1 (mod q) and so k divides p by Lemma 
Therefore k = 1 or p, but k #1 as 2' #1 (mod q). 


Exercise [7.4.1la) If n is not of the form p or p*, write n = ab with 1 <a <b. Ifn=p’, 
then n hee p-: 2p. 


Exercise [7.4.3\a). If Q = 25, then 
(p— 1)!/Q! = (p— 1) - 2)--- (p— Q) = (—1)(-2)--- (-Q) = (-1)°Q!_ (mod p). 


Exercise[ZE.2b). As (g 2 “ye = gt =1 (mod p), so gt is a square root of 1 (mod p); 


that is, g-2 =1 or —1 (mod p). But g has order p— 1 and so gt #1 (mod p). 
Exercise [7.10.2] Use Proposition [7.4.1] 


Exercise [7.10.4] In every solution n,n — 1,n — 2 have prime factors 2,3,p for some p > 3. 
At most one of these integers is divisible by p. Show that the other two lead to a solution 
to 2” — 3” = +1 and use exercise [7.10.3 


Exercise [7Z.10.5{b). Use Theorem 
(d) Make sure a is chosen so that (q,a— 1) =1. 
Exercise [7.10.6{a). The trick is to write z? = ((z — y) + y)” and then use the binomial 
theorem. One can also write r, = — and use exercise [2.5.20(a). 
Exercise [7.10.12] Take the j and p — j terms together. 


Exercise [7.10.13] Let M = ao +1 so that an = 2”M — 1 for all n > 0. Let p be an odd 
prime dividing a;. Then p divides ap. 


Hints for exercises 577 


Exercise [7.10.16\b). Since n is not a Carmichael number, the subgroup in (a) is proper 
and so contains at most half the reduced residues. (c) Let gq = 2p—1. Nown-—1=p-1 
(mod 2p — 2), so that if (a,n) = 1, then a"~' = a?~! = 1 (mod p) and a7! = a?! = 
att =+1 (mod q). 

Exercise[7.10.17(a). M, — 1 = 2? — 2 is divisible by p. 

Exercise [7.12.i{b). Let f(@1,...,%p) = (@2,...,2p,@1) in part (a). 

Exercise [717.4{c). Consider ged(q‘ — 1,(q” — 1)/(q‘ — 1)). (d) Use Lemma[ZI7Z1 


Exercise[7.18.1] Let g = gcd(A(p*), \(q’)) where p® and q/ are powers of the two different 
primes dividing m (where f > 2 if g = 2), so that 2 divides g. Now 


d(m) = lem|[A(p*) : p*llmn] < . I] .@) < ; I] #0) = ou) < o(m). 


pe||m pe||m 


Exercise [7.18.5{a). Recall exercise [4.3.7] 

Exercise [719.1] In one direction let y = 2"/9. In the other, write g = an + bA(m) so that 
if ¢ = y*, then 2” = (y*)"(yX™)? = y9 (mod m). 

Exercise [7.25.1[b). You might show that if p-1=q-1= 0 where p and q are distinct 
primes, then 1 = 0. 

Exercise [7.25.3[a). You might use the ideas in the proof of Theorem 

Exercise [7.28.6] Consider divisibility by F;. where 2” is the highest power of 2 dividing 
k — £. Then we must have p = F; and so g2" 4 = 9?" +14 get 52" + 2° Consider this 
equation mod 2* and we severely limit the possibilities. 

Exercise [7.30.1{b). Multiply through by 1— 2™* and then substitute in x to be a root of 
my (2): 

Exercise [7.33.1{b). Show that |¢n(a)| > n if b(n) > 2, and for d(n) = 2 with Ja] > 2. 
Analyze carefully the remaining few cases. 

Exercise[7.33.2[b). Prove that A = (a — 8). (c) Use exercise 2.5.20] and be careful when 
p divides a. 


EXERCISES IN CHAPTER 8 


Exercise [8.1.2(b). Use LemmaB.1.1] 
2 
Exercise [8.1.3{a). Use that (< ) (5) 


Exercise B.1.6/a). The residues 1, g?,g*,...,g?~? (mod p) are evidently distinct and non- 
zero squares. As there are Bot of them, they are all of the quadratic residues by Lemma 
8.1.1 


(b) We see above that g = g’ is not one of the quadratic residues. 


(£1)? =1. 


Exercise There are two solutions to r? = a (mod p), say, r and —r (mod p), whose 
product is r-(—r) = —a (mod p). Note also that |S| = ps. 

Exercise [8.4.1] r is the largest integer with 2r —1< i, that is, r< oa 

Exercise [8.4.5] Look at (2/p). 

Exercise [8.7.2(a). Use the Chinese Remainder Theorem and exercise [8.1.2{b). 

Exercise [8.7.5] If a is odd, then a = 1+ 2- 45+, and so 


ee ga 1 es (ee eae: 
2 2 2 2 2 


578 Hints for exercises 


Exercise Select a? = —2 (mod p) with a odd and minimal, so that 1 < a < p—1. 
Write a? + 2 = pr. Evidently pr = a? + 2 =3 (mod 8) and so r = 3p = 5 or 7 (mod 8). 
But then a? = —2 (mod r) and so (=2) = 1 with r = ate <p. This contradicts the 


induction hypothesis, and so (=) =-1. 

Exercise [8.8.1] Suppose that k > > 1. If r is a quadratic residue mod p*, then r is a 
quadratic residue mod p’*, trivially. On the other hand if r is a quadratic residue mod p*, 
then it is a quadratic residue mod p‘t? by Proposition then mod p‘t? by Proposition 
etc., up to mod p*. We take £ = 1 if p is odd, and @ = 3 if p = 2 and note that if r 
is a quadratic residue mod 8, then r = 1 (mod 8). 

Exercise [8.9.5{a). Write n = 3°m where 3{m. 


Exercise [8.9.9[a). Consider the size of the set of residues {a? (mod p)} and of the set of 
residues {m — b” (mod p)}, as a and b vary. 

(b) Take m = —1. 

(c) Prove there is a solution u,v to au*-+bv? = —c (mod p) and then multiply through 
by any z (mod p). 
Exercise [8.9.10(e). Apply Gauss’s trick as in the proof of Corollary [7.5.2] 
Exercise[8.9.12] For each solution to y? = b (mod p), consider whether there are solutions 
to 2” =y (mod p). 
Exercise [8.9.14] Let b? = —1 (mod p) and study (1+ 6)? (mod p). 


Exercise 8.9.15] Show that if a@ has order m (mod p), then oa,» consists of 9—* cycles of 
length m. 


Exercise [8.9.16/a). Use exercise[1.7.20{c). (b) Use exercise [L.7.20{b). 


Exercise[8.9.17] Select integer m with (m/n) = —1. Consider the prime divisors of integers 
of the form kn + m for well-chosen values of k. 


Exercise [8.9.18{a). Modify the ideas in Euclid’s proof that there are infinitely many 
primes. (b) n = —3. (c) Look at 4m? +3 with m odd. (d) n = 3. Note (m? — 3)/2 = 2 
(mod 3). (e) n = —4. Note m? +4 =5 (mod 8). (f) n = 2. Note m? — 2 =7 (mod 8). 
(g) n =—2. Note m? + 2 = 3 (mod 8). (h) n = —4 with (m,6) = 1. 

Exercise [8.9.24] Therefore (2) = ( 2 ) if n = 1 (mod 4), and (2) = ( 7 ) ifn =3 


n—2 n—2 


(mod 4), and so the result follows by the induction hypothesis. 

Exercise [8.10.2] If N = pg +m where 0 < m < p—1, then N — p[N/p] = N — pq =m. If 
r > 0, then m =r; and ifr <0,then m=p+r. 

Exercise [8.16.3{f). Prove this first when n is a prime power; and then note that if m #1 
(mod n), then m # 1 (mod p*) for some prime power p*||n. 

Exercise [8.17.2(a). Use the law of quadratic reciprocity. (c) Look back at exercise [8.9.5] 
Exercise [8.18.1] Calculate F,-1 (mod p) using F, and F,,41, and then proceed by induc- 
tion. 

Exercise [8.18.2] Prove that F+)41 = —Fn (mod p) by induction. 

Exercise [8.18.3] Proceed analogously to as in the two previous exercises. 


Exercise [8.18.6a). Let m = kr in exercise [0.4.10] and use the congruence in exercise 
2.5.19{c) with r = 1 and 2. (b) When does Fxr, Fer+1 = 0,1 (mod p’)? 


Hints for exercises 579 


EXERCISES IN CHAPTER 9 


Exercise If p does not divide a, then (b/a)? = —1 (mod p). Therefore p = 2 or 
p =1 (mod 4). We get the same conclusion if p does not divide b and, otherwise, p divides 
(a, b). 

Exercise[9.1.4] By induction on k > 1: It is trivial for k = 1 and otherwise let n, = a? +0? 


and ni---np-1 = +d? (by the induction hypothesis), and then the result follows from 
(9.1.1). 


Exercise d). Use (a) to prove that |ac — bd], |ad — bc] < p. 


Exercise Proceed as in the geometric proof of (6.1.1), or as in the proof of Propo- 
sition 9.1.2 


Exercise b). Replace a and b by their absolutely least residues mod p. 
Exercise b). Select any b with () = —1 in (a), and let m=r or s. 


Exercise[9.7.7] We know that n is the length of the hypotenuse of a primitive Pythagorean 
triple iff there exist coprime integers r,s of different parity with n = r? + s?. Hence all of 
n’s prime factors are = 1 (mod 4), and we know we get at least two representations of n 
if it has at least two distinct prime factors. 


Exercise[9.7.9] Since m? + 2 are odd they must be = 3 (mod 4), and so must be divisible 
by a prime = 3 (mod 4). 

Exercise a). In what domains do each of the ranges of ¢ lie? (b) We must be in the 
middle case (as y, z # 0) so that x = y in which case x(a + 4z) = p. Since p can only be 
factored in one way into positive integers, we have x = 1, z = poo. that is, v = (1,1, pt), 
(c) Pair up the elements of S' using ¢. 


Exercise |9.9.2] Trya=b=n=1. 
Exercise [9.11.i{d). Use exercise [8.9.9{c). 
Exercise |9.12.2} Use (9.12.1). 


Exercise[9.13.2] Use the characteristic polynomial for A, which is the polynomial «?—ta+d 
satisfied by A, where t is the trace of A and d is the determinant. 


EXERCISES IN CHAPTER 10 
Exercise [10.3.2] Hopefully n = pq and ¢(n) = de — 1 = 29 x 197 — 1 = 5712; if so, then 
p+q=n+1-—$(n) = 180. Therefore (a — p)(a — q) = x? — 1802 + 5891 which we factor 
to obtain p and q. 
Exercise [10.4.2(b). Use Corollary [7.5.3] 


Exercise |10.7.5| Since n is a Carmichael number we know that it is squarefree and has 
prime divisors p and g, by Lemmalf7.6.1] If a‘"—)/? = -1 (mod n), then let b= 1 (mod p) 
and b =a (mod q), and determine the value of b\~)/? (mod pq). 


Exercise [10.8.6(a). Factor 4a* + 1 and substitute in 2 = 2”. 
Exercise [10.19.1{c). Use the quadratic reciprocity law for 2 and —2. 


EXXERCISES IN CHAPTER 11 


Exercise 11.2.1] If y = 0, then min = nim. Now (m,n) = (mi,n1) = 1 and som, =m 
and ni = n contradicting our construction of the pair m,n. 


Exercise [11.2.5] Consecutive powerful numbers of the form 27a? followed by b?, for some 
integers a and b. 


Exercise [11.4.2] Use the product rule to compute the derivative. 
Exercise[I1.6.3] Given a smallest solution to 2? — dy? = 1 expand («+ Vdy)* (mod d). 


580 Hints for exercises 


Exercise [11.6.1i{c). Consider the example 1 + (2” — 1) = 2” with m > 2/e. 


Exercise [11.14.3{a). Write & 6 0) tee a : and use that a, > 2. 
(b) Take determinants of the matrix equation so that rs = (—1) 


= = po 7 a b _ [ao 1 
fore s = r or p—r. (c) Take the transpose of Cc i): (d) Write (: ) = @ 3) 


Gm 1 p r\ fa b ae 
dG) oka a esa ce, 
Exercise [11.16.2] One thought is to take 2°° if a9 > 0 and 3-° if ao < 0, and then use 
the primes 5 and 7 for ai, etc. 


n 


(mod p), and there- 


EXERCISES IN CHAPTER 12 


Exercise[12.1.3] Suppose that d is a fundamental discriminant and [a, b, c] is an imprimitive 
form of discriminant d. If h|(a,b,c), then h?|d, so that h = 2. But then D = d/h? = 0 or 
1 (mod 4), a contradiction. Now suppose that d is not a fundamental discriminant. Then 
there exists a prime p such that d = p?D, where D =0 or 1 (mod 4). There is always a 
form g of discriminant D and so pg is an imprimitive form of discriminant d. 


Exercise [12.1.4{c). Study the right-hand side of (12.1.2). 
Exercise |[12.1.5| Take determinants of both sides. 


Exercise[12.1.6] First note that b=d mod 2, and that if b = 2k+6 with 6 the least residue 
of d (mod 2), then the change of variable x + x — ky shows that [1,b,c] ~ [1,6, A], the 
principal form. The value of A must be (5 — d)/4, so that the discriminant is d = b” — 4c. 


Exercise [2.4.1] One example is d = —171. We begin by noting that |b] <a < ./171/3 = 
V57 < 8 and b is odd. If b = +1, then ac = (1+ 171)/4 = 43 with a < cso that a = 1. If 
b = +8, then ac = (94 171)/4 = 45 with a < cso that a = 1,3,5 and 1 < |b). If b= +5, 
then ac = (25+ 171)/4 = 49 with a < cso that a = 1,7 and 1 < |b|. If b = 47, then 
ac = (49 + 171)/4 = 55 with a < c so that a = 1,5 which are both < |b], so we are left 
with [1, 1, 43], [3, 3, 15], [5,3,9], [5, —3,9], [7,5, 7], and [3,3,15] which is imprimitive. 
Exercise [12.4.2] These are the smallest negative fundamental discriminants of class num- 
bers 1 to 8: 

For d = —3 we have [1,1,1]. For d = —15 we have [1, 1, 4], [2, 1, 2]. 


For d = —23 we have [1, 1,6], [2, +1, 3]. 

For d = —39 we have [1, 1,10], [2,-+1, 5], [8,3, 4]. 

For d = —47 we have [1, 1, 12], [2,-+1,6], [8,+1, 4]. 

For d = —87 we have [1, 1, 22], [2, +1, 11], [3,3, 8], [4, +3, 6]. 

For d = —71 we have [1, 1, 18], [2, +1, 9], [3,£1, 6], [4, +3, 5]. 

For d = —95 we have [1, 1, 24], [2, £1, 12], [3, +1, 8], [4,-1,6], [5,5, 6]. 

Exercise [12.4.3] These are the smallest even negative fundamental discriminants of class 
numbers 1 to 6: For d = —4 we have [1,0,1]; for d = —20 we have [1,0,5], [2, 2,3]; for 


d = —56 we have [1,0, 14], [2,0,7], [3,£2,5]; for d = —104 we have [1,0, 26], [2,0, 13], 
[3,-£2, 9], [5, £4, 6]. 
Exercise [12.5.3] Use Rabinowicz’s criterion, and quadratic reciprocity. 


Exercise[12.6.1] Prove and use the inequality am? +bmn+cen? > am? —|b| max{|mI, |n|}? + 


cn? 


Exercise [12.6.2(b). Use the smallest values properly represented by each form. 
Exercise [[2.6.5|c). Use exercise [12.6.2{e). 


Hints for exercises 581 


Exercise[12.6.71c). Given a solution B, let C = (B? —d)/4A and then [A, B, C] represents 
A properly (os (1,0)). Find reduced f ~ [A, B,C] and use the transformation matrix to 
find the representation as in (b). 


Exercise [12.8.1] Prove this one prime factor of A at a time and then use the Chinese 
Remainder Theorem. For each prime p, try f(1,0), f(0,1), and then f(1,1). 


Exercise 12.8.2) If f = [a,r,u], then the transformation r > x+ky,y > y yields that 
f ~ [a,b,c] where b = r + 2ka; that is, we can take b to be any value = r (mod 2a). 
Similarly if F = [A,s,v], then we can take b to be any value = s (mod 2A). Such a b 
exists by the Chinese Remainder Theorem provided r = s (mod 2), and r and s have the 
same parity as the discriminants of f and F. 

Exercise [12.11.3] Now d = b? — 4ac = B? — 4AC, and s0 if p|4aA, then (d/p) = 0 or 1. 
We will now prove that there are rational points on the curve aAu? = v? — dw”, by using 
Legendre’s version of the local-global principle. There are obviously real solutions with 
u = 0. If odd prime p divides aA but not d, then we have seen that (d/p) = 1. If odd 
prime p divides d but not aA, then (aA/p) = (a/p)(A/p) = of (p)or(p) = 1. Finally we 
have the case in which p divides a and d. Hence p divides b, and p*||d as d is fundamental, 
and so pt (a/p)c. So writing a = pa’,b = pb',d = pD we have D = p(b’)? — 4a’c which 
implies that (—a’cD/p) = 1. We also have (Ac/p) = os (p)or(p) = 1, and so (—a’ AD /p) = 
(—a'cD/p)(Ac/p) = 1 as needed. Dividing through by u we have aA = t? — dy? for some 
rationals t,y; letting t = 2aa + by we deduce that A = f(a,7) for some a,y € Q. We 
can select any 8,6 € Q for which ad — By = 1 to obtain a transformation for f to a form 
Ax? +b'ay+cly?. We now let s = X +kY,y = Y where k is chosen so that 2AK +b’ = B 
to obtain a form Ax? + Bry + C’y?. Since both transformations have determinant 1, we 
see that B? — AC’ = d and so C’ = C. Hence f and F are equivalent over the rationals. 


Exercise [[2.15.1(a). Use Euler’s criterion and Corollary [7.5.2 


). 
Exercise [[2.15.3(c). Use exercise [[2.15.2(c). 
Exercise[I2.18.2(a). If even N = a? +b? +c? +d? with a= b (mod 2) and c=d (mod 2), 
then N/2 = (222)? +(452)?+(34)?+(S2)’. If N = 1 (mod 4) with N = a?+b?+c?+d’, 
then we may let a be aad the rest even. To obtain representations of 2N we have the 
first two squares as (a + 6)? + (a — b)?, the other two even. This yields back a and the 


choice of b and so it is a 1-to-3 map. We have a similar construction if N = 3 (mod 4). 


Exercise[12.18.3{c). Use Legendre’s Theorem (Theorem[12.5). (d) Let u = a+b—c—d,v = 
a—b+c—d,w=a-—b—c+d, etc. (e) Be careful with the cases where u = v etc. 


EXERCISES IN CHAPTER 13 
Exercise [13.7.a/d). If 2 = 10% — 1, then the integers in Sq that are < x are the union of 
the sets {a- 10" <n < (a+1)-10*} fork =0,1,2,...,K —1. 


Exercise [13.7.4] One way is to factor the numerator and denominator and note that 
(m+ 1)? —(m+1)+1=m?4+m+1. 


EXERCISES IN CHAPTER 15 


Exercise [I5.2.6d). Proceed by induction, using the recursion in (c). 


Exercise [15.7.1] Consider the sequence 0,a1,@1 + @2,...,@1 +:-:+@, (mod n), and use 
the pigeonhole principle. 


Exercise [15.7.3] Remove the first greedily constructed increasing subsequence starting 
with ai, so the second term is a,;, where r is the minimal integer for which a, > ay. 
Suppose this has length @. If the remaining subsequence has parameters m’,n’ show that 
n>n'+1and m>max{m’,f}. Then complete the proof by an induction hypothesis. 


582 Hints for exercises 


EXERCISES IN CHAPTER 16 


Exercise[16.2.1] Assume (ab, p) = 1; let d be the order of p (mod b). Let A = a-(p*—1)/b, 
and then C = p*—1—A. Write C (mod p*) in base p as co +c1p+---+ca_-1p* *. Then 
a/b=14+) >) 50 cjp’ where c; = cr, when k is the least non-negative residue of j (mod d). 
Exercise [16.6.5{b). Use the identity AB —1=(A-—1)+ A(B-1). 

Exercise [16.7.2{a). Differentiate this expression. 


Recommended further reading 


AZ18] 
Bak84] 


BB09] 


Cas78] 


CG96] 
Cox13] 
CP05] 


Dav80] 
Dav05] 


DF04] 
Edw01] 
GG] 
Graal] 
Grab] 
GS] 
Guy04] 


HW08] 


IR90] 


Martin Aigner and Gtnter M. Ziegler, Proofs from The Book, sixth ed., Springer, Berlin, 
2018. 


Alan Baker, A concise introduction to the theory of numbers, Cambridge University Press, 
Cambridge, 1984. MR781734 
Arthur T. Benjamin and Ezra Brown (eds.), Biscuits of number theory, The Dolciani Math- 


ematical Expositions, vol. 34, Mathematical Association of America, Washington, DC, 2009. 
MR2516529 


J. W. S. Cassels, Rational quadratic forms, London Mathematical Society Monographs, 
vol. 13, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York, 
1978. MR522835 


John H. Conway and Richard K. Guy, The book of numbers, Copernicus, New York, 1996. 
MR1411676 


David A. Cox, Primes of the form x? + ny”, second ed., Pure and Applied Mathematics 
(Hoboken), John Wiley & Sons, Inc., Hoboken, NJ, 2013. MR3236783 


Richard Crandall and Carl Pomerance, Prime numbers: A computational perspective, second 
ed., Springer, New York, 2005. MR2156291 


Harold M. Davenport, Multiplicative number theory, Springer-Verlag, New York, 1980. 


H. Davenport, Analytic methods for Diophantine equations and Diophantine inequalities, 
second ed., Cambridge Mathematical Library, Cambridge University Press, Cambridge, 2005. 


David S. Dummit and Richard M. Foote, Abstract algebra, third ed., John Wiley & Sons, Inc., 
Hoboken, NJ, 2004. MR2286236 
H. M. Edwards, Riemann’s zeta function, Dover Publications, Inc., Mineola, NY, 2001. 


MRI1854455) 


Andrew Granville and Ben Green, Additive combinatorics, American Mathematical Society 
(to appear). 


Andrew Granville, The distribution of primes: Analytic number theory revealed, American 
Mathematical Society (to appear). 


Andrew Granville, Rational points on curves: Arithmetic geometry revealed, American Math- 
ematical Society (to appear). 


Andrew Granville and K. Soundararajan, The pretentious approach to analytic number the- 
ory, Cambridge University Press (to appear). 


Richard K. Guy, Unsolved problems in number theory, third ed., Problem Books in Mathe- 
matics, Springer-Verlag, New York, 2004. MR2076335 

G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, sixth ed., revised b 
D. R. Heath-Brown and J. H. Silverman, Oxford University Press, Oxford, 2008. MR2445243 


Kenneth Ireland and Michael Rosen, A classical introduction to modern number theory, 
second ed., Graduate Texts in Mathematics, vol. 84, Springer-Verlag, New York, 1990. 


MI1070716 


583 


584 


Recommended further reading 


Knu98] 
LeV96] 
NZM91] 


Rib91] 
Sha85] 


ST15] 
Ste09] 
Tigl6] 


TMFO0] 


VE87] 


Wat 14] 


Zei17] 


Donald E. Knuth, The art of computer programming. Vol. 2, Seminumerical algorithms, third 
edition [of MR0286318], Addison-Wesley, Reading, MA, 1998. MR3077153 


William J. LeVeque, Fundamentals of number theory, reprint of the 1977 original, Dover 


Publications, Inc., Mineola, NY, 1996. MR1382656 


Ivan Niven, Herbert S. Zuckerman, and Hugh L. Montgomery, An introduction to the theory 
of numbers, fifth ed., John Wiley & Sons, Inc., New York, 1991. MR1083765 


Paulo Ribenboim, The little book of big primes, Springer-Verlag, New York, 1991. MR1118843 


Daniel Shanks, Solved and unsolved problems in number theory, third ed., Chelsea Publishing 
Co., New York, 1985. MR,798284 


Joseph H. Silverman and John T. Tate, Rational points on elliptic curves, second ed., Un- 
dergraduate Texts in Mathematics, Springer, Cham, 2015. MR3363545 


William Stein, Elementary number theory: Primes, congruences, and secrets. A computa- 
tional approach, Undergraduate Texts in Mathematics, Springer, New York, 2009. MR2464052 


Jean-Pierre Tignol, Galois’ theory of algebraic equations, second ed., World Scientific Pub- 
3144922) 


lishing Co. Pte. Ltd., Hackensack, NJ, 2016. MR 


Gérald Tenenbaum and Michel Mendés France, The prime numbers and their distribution, 
translated from the 1997 French original by Philip G. Spain, Student Mathematical Library, 
vol. 6, American Mathematical Society, Providence, RI, 2000. MR1756233 


Charles Vanden Eynden, Elementary number theory, The Random House/Birkhauser Math- 
ematics Series, Random House, Inc., New York, 1987. MR943119 


John J. Watkins, Number theory: A historical approach, Princeton University Press, Prince- 


ton, NJ, 2014. MR3237512| 


Paul Zeitz, The art and craft of problem solving, third edition [of MR1674658], John Wiley 


& Sons, Inc., Hoboken, NJ, 2017. MR3617426 


Index 


p-adics, 
abc-conjecture, [223] [226] [414] [416] 


Algebraic numbers and integers, 
Algebraic units, 


Bernoulli numbers and polynomials, 

Binary quadratic forms, 
[456] [458] (459) [465] 466] 468}{470] 

Binomial coefficients, [4] [9] [68] [99] (172) 


Carmichael -function, 

Carmichael numbers, 
[387] [388] 

Catalan equation, 

Chinese Remainder Theorem, 

Class group and composition, |456}/461 
46311466 

Class number, 

Computation and running times, 

Congruent number problem, 221] [229] 

Constructibility and pre-Galois theory, 

Continued fractions, [39] [47] BST [423] 
470 


Convolutions, 
Covering systems, 
Cryptography, 
Cyclotomic polynomials, 


Descent, 219] [223] [229] (360) [361] 410] 

Diophantine problems, [215] [218] [220] 

Dirichlet. L-functions, 

Dirichlet characters, 
oll 

Discrete logs, [260] 

Discriminants, 

Divisibility tests, 

Divisors (incl. gceds), [84] (85) (129) 147) 

Dynamics, [48} [208] [360] [367] [364] [420] 
482 


Egyptian fractions, 

Elliptic curves, 
Euclidean algorithm, [33] [40] [41] [45] [47] 
Euclidean domains, [78] 

Euler’s ¢-function, [128 

Euler’s criterion, 


Factoring methods, 


585 


586 


Index 


Fermat numbers, 
[249] [280} [303} [313] [372] [379] 

Fermat quotients, 
043}045 

Fermat’s Last Theorem, 
[225] 249] [279] 

Fermat’s Little Theorem, 

Fermat-Catalan conjecture, 
[417 

Fibonacci numbers, [I] [14] [42] [47] [69] 

Finite fields, 

Frobenius postage stamp problem, [57] 

Fundamental discriminants, 

Fundamental Theorem of Algebra, 

Fundamental Theorem of Arithmetic, 


Generating functions, 
Groups, [119] [7] [74] 109] 269) [271] 326) 


Heuristics, Ikxvil 


Ideals, [37] [43] [47] 
Irrational numbers, [24] [87] [97] [223] 


Jacobi symbol, 


Lattice points, 

Legendre symbol, 

Lifting solutions mod p*, 266] [B10] 
938}1040 

Linear algebra, 
443 
445 

Local-global principle, 343 
447 

Lucas sequence, 


Magic squares, 

Mahler measure, [540 

Matrices and matrix groups, 
[47] [49} (105) [180} 276} (361) [36.4] [424] 


[48211484] 


Mersenne numbers, [8] [44] [69] [96] (130) 
Mobius function, 
Modularity, [519 

Multiplication table problem, 


Orders (of elements), [73] 


Pascal’s triangle, 5} [99] 

Pell’s equation, [406] [408] [42°7] [428] [437] 

Pell’s equation; negative, 
432 

Perfect numbers, 

Polynomial properties, 

Power residues, 

Primality testing, 

Prime k-tuplets conjecture, 

Prime factors: number of, 

Primes in arithmetic progressions, |159 
O27 

Primes: infinitely many, 

Primes: number of, 
(184) (185) [18"7) [194] 

Primitive roots, [242] 243] 246] 258] 260] 

Pseudoprimes, 

Pythagorean triangle, 


Quadratic fields, 

Quadratic forms, 

Quadratic reciprocity (Law of), [300 
[301] [303} [305] [309] [313] [315] [323] 

Quadratic residues / non-residues, [296 
299 

Quadratic residues and non-residues; 


least, [297] [310] [319] 


Index 


587 


Residues (mod n), [61] 
[262] [295] [305} [315] [328] [405] 

Resultants and discriminants, [77] 

Riemann zeta-function, 
[182] [184] (186) [192] 

Rings and fields, 

Roots of polynomials, [15] 
[267] 275] [537 [540] 


Second-order linear recurrence 
sequences, 
[290] [385] 

Square roots (mod n), 

Sums of (more than two) squares, [471] 

Sums of powers of integers, [3] [9] 
[287] 292] 

Sums of two squares, 

Sumsets, [519] [521/524] [530] [532] 


Tiling, [41] 
Transcendental numbers, 


Unique factorization, 
272 


Waring’s problem, 
Wilson’s Theorem, 


Drawing by Robert J. Lewis. Courtesy of Princeton University Press. 


Number Theory Revealed: A Masterclass acquaints enthusiastic students with the “Queen 
of Mathematics”. The text offers a fresh take on congruences, power residues, quadratic 
residues, primes, and Diophantine equations and presents hot topics like cryptography, 
factoring, and primality testing. Students are also introduced to beautiful enlightening 
questions like the structure of Pascal's triangle mod p and modern twists on traditional 
questions like the values represented by binary quadratic forms, the anatomy of integers, 
and elliptic curves. 


This Masterclass edition contains many additional chapters and appendices not found 
in Number Theory Revealed: An Introduction, highlighting beautiful developments and 
inspiring other subjects in mathematics (like algebra). This allows instructors to tailor 
a course suited to their own (and their students’) interests. There are new yet accessible 
topics like the curvature of circles in a tiling of a circle by circles, the latest discoveries on 
gaps between primes, a new proof of Mordell’s Theorem for congruent elliptic curves, and 
a discussion of the abc-conjecture including its proof for polynomials. 


About the Author: 


Andrew Granville is the Canada Research Chair in Number Theory at the University of 
Montreal and professor of mathematics at University College London. He has won several 
international writing prizes for exposition in mathematics, including the 2008 Chauvenet 
Prize and the 2019 Halmos-Ford Prize, and is the author of Prime Suspects (Princeton 
University Press, 2019), a beautifully illustrated graphic novel murder mystery that 
explores surprising connections between the anatomies of integers and of permutations. 


N 978-1-4704-4158- 


_ For additional information 
-<— and updates on this book, visit 


6 
| www.ams.org/bookpages/mbk-127 
9"781470'44 1586 ‘s 


MBK/127 BX M S 


°°. WWW.ams.org 


